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A Generalized Convergence Theorem for 
Neural Networks 
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Abstract —A neural network model is presented in which each neuron 
performs a threshold logic function. An important property of the model is 
that it always converges to a stable state when operating in a serial mode 
and to a cycle of length at most 2 when operating in a fully parallel mode. 
This property is the basis for the potential applications of the model, such 
as associative memory devices and combinatorial optimization. The two 
known convergence theorems (for serial and fully parallel modes of opera- 
tion) are reviewed, and a general convergence theorem is presented which 
unifies the two known cases. Some new applications of the model for 
combinatorial optimization are also presented, in particular, new relations 
between the neural network model and the problem of finding a minimum 


cut in a graph. 


I. INTRODUCTION 


The neural network model is a discrete-time system that can be 
represented by a weighted and undirected graph. A weight is 
attached to each edge of the graph and a threshold value attached 
to each node (neuron) of the graph. The order of the network is 
the number of nodes in the corresponding graph. Let N be a 
neural network of order n; then N is uniquely defined by (W,T) 


where 


' 


© W is an nX nm symmetric matrix, where W, 


weight attached to edge (i, /); ‘ 


© T is a vector of dimension n, where T, denotes the thres- 


hold attached to node i. 


wee 


> 4 Every node (neuron) can be in one of two possible states, either 1 
i or ~1. The state of node i at time ¢ is denoted by V(r). The 


state of the neural network at time ¢ is the vector V(t). 
The next state of a node is computed by 


v(rst)=sen(H(nye{ BEES 


where 


(1) = LW ()-T, 
j=l 


The next state of the network, i.e, V(t +1), is computed from 

J the current state by performing the evaluation (1) at a set S of 
the nodes of the network. The modes of operation are determined 

by the method by which the set S is selected in each time 
interval. If the computation is performed at a single node in any 


+g time interval, i.e., |S|=1, then we say that the network is operat- 
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ing in a serial mode; if (S|=n, then we say that the network is 
operating in a fully parallel mode. All the other cases, i.e., 
1 < S| <n, will be called parallel modes of operation. The set 5 
can be chosen at random or according to some deterministic rule. 

A state V(t) is called stable if and only if V(t) = sgn(WV(1)— 
T), i.e., 10 change occurs in the state of the network regardless of 
the mode of operation. 

An important property of the mode! is that it always converges 
to a stable state when operating in a serial mode and to a cycle of 
length at most 2 when operating in a fully parallel mode [3], [5]. 
Section II contains a description of these convergence properties 
and a general convergence theorem which unifies the two known 
cases. New relations between the energy functions which corre- 
spond to the serial and fully parallel modes are presented as well. 

The convergence properties are the basis for the application of 
the model in combinatorial optimization. In section III we de- 
scribe the potential applications of a neural network mode! as a 
local search device for the two modes of operation, that is, serial 
mode and fully parallel mode. In particular, we show that an 
equivalence exists between finding a maximal value of the energy 
function and finding a minimum cut in an undirected graph, and 
also that a neural network model can be designed to perform a 
local search for a minimum cut in a directed graph. 


II. CONVERGENCE THEOREMS 


An important property of the model is that it always con- 
verges, as summarized by the following theorem. 


Theorem 1; Let N=(W,T) be a neural network, with W 
being a symmetric matrix; then the following hold. 

1) Hopfield [5]: Uf N is operating in a serial mode and the 
elements of the diagonal of W are nonnegative, the network will 
always converge to a stable state (i.c., there are no cycles in the 
state space). 

2) Goles [3]: If N is operating in a fully parallel mode, the 
network will always converge to a stable state or to a cycle of 
length 2 (i.e., the cycles in the state space are of length < 2). 


The main idea in the proof of the two parts of the theorem is 
to define a so-called energy function and to show that this energy 
function is nondecreasing when the state of the network changes. 
Since the energy function is bounded from above, the energy will 
converge to some value. Note that, originally, the energy function 
was defined so that it is nonincreasing (3), [5]; we changed it to 
be nondecreasing in accordance with some known graph prob- 
lems (see, ¢.g,, min cut in the next section). 

The second step in the proof is to show that constant energy 
implies in the first case a stable state and in the second a cycle of 
length < 2. The energy functions defined for each part of the 
proof are different: 


E,(t) =V7(t)WV(1)-(V(a) + V0)" 
E,(t) =V7(t)WV(t-1)-(V(1) 4 VE-D)"T (2) 


where E,(t) and E,(t) denote the energy functions related to the 
first and second part of the proof. 

An interesting question is whether two different energy func- 
tions are needed to prove the two parts of Theorem 1. A new 
result is that convergence in the fully parallel mode can be 
proven using the result on convergence for the serial mode of 
operation. For the sake of completeness, the proof for the case of 
a serial mode of operation follows. 


Proof of the First Part of Theorem !: Using the definitions in 
(1) and (2), let AE = £,(1+1)— E,(1) be the difference in the 
energy associated with two consecutive states, and let AV, denote 
the difference between the next state and the current state of 
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qu’ils permettent de coder des 
prédicats d’ordre plus élevé 
que 1, a la condition d'appren- 
dre au réseau le comporte- 
ment qu’on attend de lui. Dans 
ce modéle, ii n'y a pas de 
connexions entre les cellules 
d'une méme couche, et les 
connexions d’une couche 4 
l'autre sont orientées dans le 
sens entrée/sortie. Cette tech- 
nique permet entre autres, de 
faire de la reconnaissance de 
formes. Comme pour le per- 
ceptron, les poids sont modi- 
fies au cours de |'apprentis- 
sage. La fonction de transition 
de l'automate i est de la forme 
Fi (Dj Wij . Xj) of les Xj sont 
les signaux d’entrée, et Fi dif- 
férentiable. 

Soit Xk, le vecteur d’entrée 
représentant la forme a recon- 
n | -2. A partir de l'entrée Xk, 
I’. uc du réseau est calculé, les 
signaux d’entrée se propa- 
geant de couche en couche. 

La sortie Yk désirée pour 
l'entrée Xk est présentée au ni- 
veau de la sortie. Pour chaque 
cellule, de la couche d’entrée 
vers la couche de sortie, de 
proche en proche, on va calcu- 
ler une mesure € de l’erreur, 
qui va permettre d’ajuster les 
poids des connexions. 








par rapport a un autre PC. + 






*y ae 
J, ENREGISTREMENT de 
AVANTAGES : 





Sis est une cellule de sortie 
€s = 2(Ss — Ys) . F(X Wsj - Xi) 

Sinon : 
ei= AD Wij. Xj). 0 Wii- €) 

S étant la sortie obtenue 4 
partir de I'entrée X, et Y la sor- 
tie désirée. F est la fonction de 
transition, Le calcul des si- 
gnaux d’erreur, par rétropro- 
pagation de la derniére couche 
a la premiére couche, permet 
de calculer les nouveaux poids, 
de proche en proche, de la 
couche N a la couche O: 

Wij = Wij — e . €iXj, 4 partir 
des poids obtenus par |’exem- 
ple précédent. 

Une fois la phase d’appren- 
tissage terminée, les poids 
sont correctement ajustés pour 
les différents exemples d’ap- 
prentissage, et on peut deman- 
der ensuite au réseau de re 
connaitre des formes bruitées, 
qu'il n’a jamais ‘vues’. 

Les applications de cette 
méthode sont nombreuses 
dans le domaine de la recon- 
naissance de formes, oi ja 
‘bonne’ réponse peut étre pré- 
sentée au réseau avec |’exem- 
ple a reconnaitre, mais ce n'est 
pas toujours possible, et on 
peut attendre, du réseau, des 
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reponses dont il aura lu-méme 
élaboré la méthode de produc- 
tion. C'est ce que savent faire 
les réseaux 4 apprentissage 
non supervisé. 


Des applications 


Tous les développements 
que nous avons effectués ici 
sont, vous l’aurez compris, 
d'ordre théorique. Nous ne 
nous limiterons pas a ces dis- 
sertations. Dans notre pro- 
chain numéro, vous trouverez 





« Network Learning » (F. Fogelman 
Soulié, Y. Le Cun, P. Gallinari, S. 
Thiria). A paraitre dans Machine 
Learning, vol. 3, Kodratoff, R. Mi- 
chalski. 

«An Introduction to Computing 
with Neural Nets » (Richard, P. 
Lippmann) in [EEE Assp Magazine, 
April 1987. 

«De nouvelles voies vers IIA» 
U.C. Perez), Masson, mars 1988. 
«Vers le neuro-ordinateur » (C. 
Durand) in Micro-Systémes, octo- 
bre 1987. 

«Le cerveau artificiel va-t-il nous 
dépasser ? » (Philippe Chambon) in 
Actuel, juillet-aott 1988. 

« Self-organizing Feature Maps and 
the Travelling Salesman Problem » 
(B. Angeniol, G. de la Croix Vau- 





Bibliographie 











































trois applications écrites » 
Turbo C qui accompagnerg 
les derniers exemples de ma 
thodes théoriques. 
Le perceptron décrit ici se 
mis en fonction, un réseau 
Hopfield assurera pour v 
l'apprentissage de formes 
mentaires ainsi que leur ident; 
fication dans une zone « bng 
tée », ef un systéme intelligent 
de recherche d’itinéraire idéa 
sera implémenté. 4 
Claire Nedellecam & * 
P. Chassany’ 


bois, J.Y. Le Texier), in Thomse 
CSF/DSE, Bagneux. 

«Un modéle connexionniste po 
la réduction du bruit de reconstruc: 
tion tomographique » (C. Obef 
fianne, G. Galibourg EHEI)*. Re 
« Reconnaissance de la parole par’ 
réseaux multicouches » (L. Y. Bot- 
tou EHEI)*. 
« Reconnaissance de bruits acous 
tiques sous-marins par réseaux 
multicouches » (M. de Bollivier, A. 
Lemer EHEI)*. 


* A paraitre dans Neuro-Nimes am 
1988, Proceedings, Nimes, 15-17 pg 
novembre 1988. 

« Self-organization and Associative 
Memory » (T. Kohonen) in Sprit 3 
ger-Verlag, seconde édition. 
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a technique des 
cartes auto-organi- 
satrices de T. Koho- 
nen en est un 
exemple. Elle peut 
comprendre, com- 
* me les méthodes précedentes, 
une phase d’apprentissage et 
une phase de généralisation. 
F: Mais son apprentissage est dit 
B « non supervisé » car les exem- 
‘ples sont proposés au réseau 
Rsans les reponses correspon- 
% dantes. Cette méthode, propo- 
sée par T. Kohonen posséde 
une propriété qui la rend inté- 
Fessante pour la résolution de 
F certains problémes. 

EF Contrairement a ce qui se 
passe pour les méthodes 4 ap- 
'prentissage supervisé, le ré- 
i seau s’organise spontanément 
gap en fonction des entrées, et la 
feprésentation interne peut 
f tre interprétée car elle est di- 
B fectement liée a la représenta- 
tion des informations en en- 
trée. Pour chaque nouvel 
exemple, le réseau s’organise 
ede maniére a représenter la to- 
Fpologie d’aussi prés que possi- 

bie, par rapport au modéle. 








du voyageur 
de commerce 


_ Cette méthode est particu- 
ement adaptée 4 la recon- 


les modeles. 
connexionnistes 


Nous avons vu dans notre précédent numéro comment 
on peut apprendre a des réseaux a « apprentissage 
supervisé », 4 reconnaitre des formes. Les réseaux 
neuronaux a apprentissage non supervisé « savent » 

et tentent d’élaborer eux-mémes des réponses... 











domaine d' application, le prin- 
cipe reste Je méme ; on consti- 
tue une « carte » de référence, 
et on cherche 4 appliquer les 
noeuds du réseau aux diffe- 
rents points de |’exemple. 
Dans te cadre du probléme du 
voyageur de commerce, a cha- 
que itération, les noeuds du ré- 
seau se rapprochent des villes, 
jusqu’a ce qu’a chaque noeud 
corresponde une ville (fig. 1) ; 
et le résultat est alors lisible sur 
le réseau. 

Soit M, le nombre de villes, 
de coordonnées (Xi, Yi). Initia- 
lement, un seul noeud est créé, 
de coordonnées dans le plan : 
C1 = 0, Cl’ = 0. Chaque 


noeud aura deux voisins. Un 
«tour» consiste a examiner 
toutes les villes, une 4 une. 
Pour une ville i, on cherche fe 
neeud jc le plus proche, de dis- 
tance Dij minimale, avec Dij 
= (Xi - G2 + Yi- Gj’). 

On rapproche ce noeud jc et 
ses voisins de fa ville i, on 
prend alors : 

Cj = CG + f (Gyn) . (Xi - Cj) 

et 

Cj = CG +f (Gn) . (Yi- Gj’) 
oti la variation de distance est 
proportionnelle 4 la distance 
entre le noeud j et la ville i, on 


a: 
n = inf (j - je (mod N), je ~ j 


é a-All other ° ti 
i “algorithms “3/4 zmethod 
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(mod N)) of N est le nombre 
de noeuds. 

avec f (G,n) = (1/32) . exp (- 
n2/G2) 

Le gain G est un facteur im- 
portant ; il détermine le nom- 
bre de nceuds qui se déplacent 
pour chaque ville, pour G 
— oo, toutes les villes « bou- 
gent»; pour G — 0, seul le 
noeud le plus proche de Ja ville 
se déplace. Il est intéressant de 
prendre G = (1 — a). G, ainsi 
le gain décroit, et plus on se 
tapproche de !a solution, 
moins les nceuds ont a bouger. 

Initialement, il existe un seul 
neeud, mais un neeud est dupli- 
qué s'il a été choisi comme le 

lus proche de deux villes dif- 
férentes durant le méme tour. 
Le nouveau neeud a les mémes 
coordonnées et devient le voi- 
sin de son ‘pére’. On détruit un 
noeud qui n’a pas été choisi du- 
fant trois tours complets. 

La solution finale dépend de 
ordre dans lequel les villes 
sont examinées, et de la rapi- 
dité de convergence détermi- 
née par a; pour a petit, la 
convergence est lente, mais le 
résultat meilleur que pour a 
grand (voir tableau ci-des- 
sous). : 


Paissance de la parole avec la- 
BAuelle on peut générer des 
es phonémiques (T. Koho- 
Ben), @ la reconnaissance 
gC Images d’électrophorése par 
€xemple (M. Keller, F. Fogel 
Mann-Soulié) et au probléme 
4. Voyageur de commerce (B. 
geniol). Prenons ce demier 
emple pour illustrer l’algo- 
me, implémenté a la Thom- 
pO" CSF/DSE. Quel que soit le 
Février 1989 








Ses 


e sur tin carré de 1 pat'l (R. Durbin et D. 





Résultats pour 50 villes distribuées de 
1987). Sark een 
a- meilleurs résultats de R. Durbi 

b - résultats obtenus par la méthode ve i 
¢- meilleur résultats obtenus par B. i par 

organisatrices, avec fe nombre de fois que ces résultats sont apparus sur 4 000 essais 
d- moyenne des meilleurs résuftats sur 10 essais_; y 
€ _ moyenne en in essai Dour a 0,2 ay 

# - moyenne pour a = 0,02 

On constate qu'il est plus intéressant de 
meilleur, que de prendre « petit (0,02) pour un 












ase E 
y grand (0,2), mais de faire 10 essais et de conserver e 
essai, avec un temps de convergence similaire. ><" 






U'avantage de cette mé 
thode pour le probleme du 
voyageur de commerce, est 
que le nombre de nceuds est 
proportionnel au nombre de 
villes, et non au carré du nom- 
bre de villes comme pour |’al- 
gorithme de Hopfield que nous 
avons présenté plus haut ; on 
obtient d’autre part un résultat 
trés acceptable en un temps 
raisonnable (fig. 2). 

L'intérét de l'étude du pro- 
bléme du voyageur de com- 
merce, est qu’il présente des 
similitudes intéressantes avec 
d'autres problémes, du type de 
l'allocation de ressources, ou 
de !'implantation de proces- 
seurs dans une puce. Le labo- 
ratoire de la Thomson-CSF 
s'est intéressé également 4 
lapplication physique des r& 
seaux de neurones formels sur 
Jes processeurs interconnec- 
tés. Il s'agit d’affecter des ta- 
ches qui communiquent entre 
elles 4 un réseau de proces- 
seurs, de telle sorte que la 
quantité totale d'information 
échangée soit minimale. Le 
probléme présente des similitu- 









| Axil Jes communications 


¢¢ Un voyageur de commerce 
choisit son itinéraire avec 
logique. Le réseau « s‘auto- 
organise » spontanément et tente 
de suivre une logique similaire. 99 


entre les différentes parties 
(fig. 3). L'application aux sys- 
témes de gestion de base de 
données est directe, le pro- 
blame étant de répartir les 
données sur différents sup- 
ports, de telle sorte que la 
quantité d’informations 4 
@changer soit minimale. 

Le modéle des cartes topo- 
logiques de Kohonen trouve 


as don: des applications dans les 


Jes avec le probléme précé- 
dent, les taches étant les villes, 
et la quantité de données a 
échanger entre deux taches, la 
distance entre deux villes. 
L‘intérét de cet algorithme 
réside dans le fait qu’il permet 
d'utiliser différents réseaux 
formels sur un méme réseau 
physique, le cablage n’est plus 
dédié 4 une application, mais 
rendu ‘adaptable’ a de nom- 
breux réseaux, par l'utilisation 
de cette méthode. Elle permet 
de réorganiser l'attribution des 
processeurs en cours d'utilisa- 
tion, dans le cas, par exemple, 
ou des connexions disparai- 
traient (Changeux). D‘autres 
applications se développent 
dans les domaines de !a classi- 
fication et du découpage de 
graphes. Il s‘agit de rechercher 
le meilleur découpage de gra- 
phe, de maniére 4 limiter au 
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Partie 1 










domaines ot il est intéressant 
que ta représentation dans le 
réseau conserve la topologie 
des données. 





0: les villes. X : les naeuds. 

A: Nombre d’itération : 0. 
Nombre de nceuds : 15. 

B: Nombre d’itérations : 5. 
Nombre de neeuds : 58. 











Partie 2 
2 © 
o) 
Fig. 3 
Février 1989 
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Un probléme 
difficile : 
reconnaitre 
Ja parole... 


Dans te méme laboratoire, 
Léon Bottou applique la mé- 
thode de rétropropagation de 
gradient 3 la reconnaissance 
de la parole. Les propriétés des 
réseaux : l'adaptabilité et la 
résistance au bruit se sont ré- 
vélées trés intéressantes dans 
le cadre de ce probléme réputé 
difficile, méme par les 
« connexionnistes ». 

Un réseau multicouche dont 
deux couches cachées est uti- 
lisé de facon particuliére ; les 
cellules de la deuxiéme, de la 
troisigme et de la derniére cou- 
che sont connectées « tempo- 
rellement » aux cellules de leur 

guche précédente, les si- 
gnaux recus par ces cellules 4 
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Output Desired 


Vinstant t correspondent a 
Vétat des cellules émettrices 
sur un intervaile de temps au- 
tour de t (fig. 7). 

Le réseau est capable de re- 
connaitre 20 mots isolés, pro- 
noncés par 4 locuteurs diffé 
rents, avec un taux de réussite 
de 94 %, l‘apprentissage por- 
tant sur 400 déformations des 
20 mots a reconnaitre (fig. 8). 

Ces expériences, limitées a 
une petite base de données, 
sont trés encourageantes ; les 
performances sont compara- 
bles a celles obtenues par le 
LIMSI (Laboratoire d’informati- 
que pour la mécanique et les 
sciences de |‘ingénieur) 4 Or- 
say, par des techniques classi- 
ques. Les recherches se pour- 
suivent sur un ensemble de 
135 mots et 25 locuteurs ot 
l'algorithme de rétropropaga- 
tion de gradient devrait se 
montrer ahs performant. 





UN 
DEUX 
TROIS 
QUATRE 
CING 


SIX 


Fig. 8 
Spectrogramme 
d'un mot que 
le réseau doit 
reconnaitre. 
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vita = 





- classe 1 : raie non brouillée et instable 
+ classe 2 : raie non brouillée et stable 

- classe 3 : raie brouillée et instable 

- classe 4 : raie brouillée et stable 








«» et les bruits 
acoustiques 
Sous-marins 


L’algorithme de rétropropa- 
gation de gradient est égale- 
ment utilisé par M. de Bollivier 
au laboratoire de !’EHEI et par 
J. Tanguy de Thomson-Sintra 
ASM 4 Arcueil, pour la recon- 
naissance de bruits acousti- 
ques sous-marins. Comme 
pour i‘application précédente, 
c'est le spectrogramme des 
sons qui est traité par un ré- 
seau multicouche. Quatre ty- 
pes de formes de spectrogram- 
mes (fig. 9) sont appris au 
réseau qui sait ensuite classi- 
fier de nouvelles formes. La re- 
cherche porte actuellement sur 
la classification des rais spec- 
trogrammes de signaux, et 
non sur le spectrogramme en- 
tier. La classification est faite, 
jusqu’a présent, par des opéra- 
teurs, et s'est révélée difficile 4 
automatiser par des méthodes 
classiques. Mais « dans le cas 
de données particuliérement 
brouillées, un systéme [...] 
Neuronique pourrait simple- 
ment aider l'opérateur a déter- 
miner ou non la présence d'un 
rai», Ou méme « étre utilisé 
comme extracteur symbolique 
fournissant des données 4 un 
systéme expert aidant |’opéra- 
teur 4 obtenir des informations 

ertinentes sur une bruiteur. » 
'M. de Bollivier). 

Les résultats obtenus sont 
satisfaisants, que les lofars 
soient coupés, interrompus 
(75 % de réussite 4 la générali- 
sation) ou non (87,5 % de 
réussite 4 la généralisation) 
(fig. 10), mais le probléme est 
simplifié par rapport a la réalité 
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(les rais pourraient étre croi- 
sés, obliques avec changement 
de sens...). La recherche de 
vrait étre poursuivie sur des si- 
gnaux plus réalistes, pour dé 
boucher sur une application 
utilisable dans des conditions 
normales. 

Ces applications montrent 
combien les modéles 
connexionnistes apportent des 
solutions intéressantes 4 des 
problémes complexes de re- 
connaissance des formes et 4 
certains problémes d’optimisa- 
tion, du type de celui du voya- 
geur de commerce. Si les appli- 
cations connexionnistes en 
sont actuellement au stade de 
la recherche, des applications 
industrielles devraient rapide- 
ment voir le jour. : 

L'intérét du connexion- 
nisme réside dans la capacité 
des réseaux 4 apprendre et a 
traiter des informations brui- 
tées ou incomplétes. Le point 
faible reste la difficulté d’inter- 
prétation de |’état d’un réseau, 
l'information y étant, par défi- 
nition, répartie. On peut imagi- 
ner qu'un jour un systéme in- 
telligent sera concgu, dont des 
réseaux formeront la couche 
basse, et un systéme expert, la 
couche haute. La couche basse 
sait traiter des informations 
élémentaires sans les dénatu- 
ter, et la couche haute, inter- 
préter et manipuler des 
concepts produits par la cou- 
che basse. Cette « coopéra- 
tion » des réseaux connexion- 
histes et des systémes experts 
reste de |’ordre du réve, et les 
recherches en intelligence arti- 
ficielle se poursuivent ensem- 
ble dans ces deux domaines 
sans se rejoindre encore. 

Suite page 66 





Fig. 9. — tofars. De gauche a droite et de haut en bas : classes 144, 
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Qualitative Analysis and Synthesis of a 
Class of Neural Networks 


JIAN HUA LI, ANTHONY N. MICHEL, FELLOw, IEEE, AND 
WOLFGANG POROD, MEMBER, IEEE 


Abstract —In the present paper we investigate the dynamic properties of 
a class of neural networks (which includes the Hopfield model as a special 
case) by studying the qualitative behavior of equilibrium points. Our 
results fall into one of two categories; one type of results pertains to 
analysis (e.g., stability properties of an equilibrium, asymptotic behavior of 
solutions, etc.) while the second type of result pertains to synthesis (e.g., 
the design of a neural network with prespecified equilibrium points which 
are asymptotically stable). Most (but not all) of the results presented 
herein are global. We demonstrate the applicability of our results by means 
of a specific example. 


I. INTRODUCTION 


N THE PRESENT paper we consider a class of nonlin- 
ear, autonomous, ordinary differential equations of the 
foo 


{L) 


We will define all symbols in (L) at a later point. Here it 
suffices to state that x is an n-vector, x denotes the 
derivative of x with respect to time 1, H(x) is a matrix- 
valued function, T is a matrix, S(x) is a vector-valued 
function, and / denotes an input vector. 

This system of equations can be used to model neural 
networks. With an appropriate set of assumptions, which 
we will give later, the system (L) will constitute a slight 
generalization of the Hopfield model [1]. 

The results which we establish for system (L) fall into 
one of two categories. One type of result addresses the 
analysis of system (L) while the other type pertains to 
synthesis procedures for system (L). In the following, we 
give a brief summary of the results developed herein: 

Analysis 

* We show that system (L) possesses unique solutions 
Wiuueh exist for all ¢> 0. 

b) We associate with system (L) an energy function E 
and we show that E decreases monotonically along non- 
equilibrium solutions of (L), as ¢ increases and that each 
nonequilibrium solution of (L) tends to an equilibrium of 
(L) as t becomes large. 

c) We show that for (L) there are only a finite number 
of equilibrium points. 


x= 


— H(x)(-Tx+S(x)-J). 
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d) We show that if % is a stable equilibrium for (L), 
then it is also asymptotically stable. 

e) We show that an (asymptotically) stable equilibrium 
¥ is a local minimum of £, 

f) We identify 2” regions in n-space and we show that 
in each of these regions there is at most one asymptotically 
stable equilibrium of system (L). 

g) We establish an upper bound for the number of 
asymptotically stable equilibrium points of (L), under cer- 
tain restrictions. 


Synthesis 

Suppose that we are given information in the form of a 
collection of m vectors which we wish to store as (asymp- 
totically stable) equilibrium points of a neural network of 
the form (L), satisfying certain reasonable conditions. We “4 
address the following questions: 1) Can the specified infor- 
mation vectors be stored as (asymptotically stable) equi- 
librium points of system (L)? 2) If the answer to 1) is 
affirmative, how can this be accomplished? To answer 
these questions, we obtain the following: : 

h) We give a simple criterion to check whether a given ‘% 
set of vectors can be stored as equilibrium points of (L). 

i) Under the condition that this criterion is satisfied, we 
develop a synthesis procedure to store these vectors as 
asymptotically stable equilibrium points of (L). 

A few of our results are not particularly surprising and 
have been either hypothesized or taken as fact without 2 
proof in several existing works on neural networks. We % 
have included these results to lay a proper foundation and # 
for purposes of completeness. Also, a few of our results ¥ 
parallel existing ones which, however, pertain to models a 
that are not compatible with our model (nor with the 4 
Hopfield model). We will identify these cases as they arise. ‘4 
It is emphasized that the principal results of the present / 
Paper appear to be new. ‘ 

The structure of this paper is as follows. In Section I], 3 
we establish the notation used throughout this paper and ¥ 
we present essential background material. In Section III, 
we present the system which we study and we enumerate 3 
the assumptions made for this system. In Section IV, we 4 
state and prove the results summarized above under the 
category Analysis while in Section V, we establish the % 
results and procedure summarized above under the cate- 
gory Synthesis. The applicability of the results of Section V 
is demonstrated by means of a specific example in Section 3 
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VI. The paper is concluded with a few pertinent remarks in 
Section VII. 


II. NOTATION AND PRELIMINARIES 


The present section consists of two parts. First. we 
establish the notation used throughout this paper. Next, 
* we provide some essential preliminary results. 


A, Notation 


Let V and W be arbitrary sets. Then VUW, VOW, 
V—W and V XW denote the union, intersection, dif- 
ference and Cartesian product of V and W,, respectively. If 
V is a subset of W, we write VC W and if x is an element 
of V, we write x €V. If f is a function from V into W, we 
_ write f: VW and we let f(U) = (f(x) EW: xEU} for 
UCV, and f(y) =(xEV: f(x) =y} for yeW. Let 
denote the empty set, let R denote the set of real numbers, 
and let R* = (0,00). If V,,---,V, are n arbitrary sets, their 
Cartesian product is denoted by I1.\V, =X +++ XV, If 
in particular, V=V,= +++ =V,, we write 17.V, =". Let 
R" be real n-space. If x ER”, then xP (Xx1,1°°, X,) de- 
notes the transpose of x. When using a norm for x € R”, 
|x}, we will have in mind |x| = MAX, cien{ {Xl}. If xe R" 
and YCR", then x LY will mean that x? y=0 for all 
yeY.If VCR’, then V and dV represent the closure and 
the boundary of V in R”, respectively, Also, we let 
B(%,r) = (xR |x—X|<r)} for ZER" and r>o. If 
A=[a,,] is an arbitrary matrix, then A7 denotes the trans- 
pose of A and the norm of A is defined as || Al] = 
SUP jx <1 {LAx}} (cf. (5, ch. 10, § 2)). If A is a symmetric 
matrix, by A > 0 we mean that A is positive definite and 
by 420 we mean that A is positive semi-definite. If 
E,,::+, E, are n vector spaces over R, LCE ++, Ey R) 
denotes the set of continuous multilinear maps from IT/_, £; 
to R (cf. (6, appendix AJ). If in particular, Ey=+-- = E, 
= E, we write L(E,,:°:, £,, R) = L'(E; R). For a func- 
tion f: VW, where V cR",WCR, the kth-order de- 
rivative (cf. [6, ch. 4]) is denoted by D‘f: V> L*(R": R), 
if it exists. A function F: VW, where Vc R",WCR™, 
is said to be of class C* or a C*-function if for each 
component of F, the kth-order derivative exists and is 
continuous. 

Given a C?-function g: VR where VC R", we de- 
note the gradient of g by Vg(x) = (0g (x) ++, ag,(x))7 
= Dg(x,-) and we denote the n Xn Jacobian matrix of g 
by J,(x) = [87g(x)/x, 4x,] = D%(x,-,-). An element £ 
EV is called a critical point of g if vg(x) = 9. Also, 
¥€V is said to be a local minimum of g if there is an 
open neighborhood U of ¥ such that for all x € U, g(x) > 


g(¥). 
B. Systems of Ordinary Differential Equations 

We will consider systems of first order, autonomous 
ordinary differential equations of the form 

x= f(x) (E) 

where x = (x,,°° +,x,)7 €D, D is a non-empty connected 
open subset in R", ER, X= dx /dt and f isa C!-func- 
tion from D into R". We will have occasion to utilize the 
properties of (E) enumerated below (cf. [4, ch. 2]). 
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Lemma 2.1; for each X & D, there is a unique non-con- 
tinuable solution of (E), given by 


(-,0,#): 0,7) > D 


with 9(0,0, %) = &. This solution is non-continuable in the 
sense that if there is another solution of (E) 


,(-,0,%): (0,4) 7D 


with p,(0,0, %) =X, we have f, <7 and ¢,=9 on (0, Gs 
We call the function p(-,0, ¥) the solution of (E) starting 
at %. For purposes of brevity, we frequently write (1. X) 
or p(t) in place of (1,0, £), when the initial conditions 
(0, %) are understood. 

Lemma 2.2 ({4, ch. 2, corollary 3.2}): Assume that D is 
bounded. Then for any solution p(-, £): [0,7) > D, either 
p(t) 7 aD ast forf=+e. 

A constant solution p(t, X) =X is said to be an equi- 
librium of (E). Equivalently, any point ¥€D such that 
f(&) =0 is an equilibrium of (E). 

Lemma 2.3: For a non-equilibrium solution @(-, ¥): 
(0,7) > D, f(p(t, ¥)) #0, for any 0<7< f 

The proof of this lemma is a direct consequence of the 
uniqueness of the solutions of (E). 

An equilibrium (1) = X of (E) is said to be isolated if 
there is an r>O such that for any x © B(X,r)-{X}. 
f(x) #0. 

Let £ be an isolated equilibrium of (E). For the follow- 
ing definitions, we assume that solutions p(-,0, x) for (E) 
exist for all 1 > 0 when |x — #|<h for some h > Q 

a) & is said to be stable if for any e> 0 (e< A), there is 
a 6=8(e)>0 such that |p(1,0,x)- X|<e, for all re 
(0, + 00) whenever |x — %| <4; 

b) & is said to be asymptotically stable if (i) i is 
stable and (ii) there is an 7>0 (n<h) such that 
lim, — +9 !P(,0, x) — X] = 0 whenever jee oo 

c) & is said to be unstable if it is not stable. 

Finally, given a C!-function g: DR, we define the 
function Dgyg: D> R by Ding (x) =Ve(x)7 f(x) and we 
call Digg the derivative of g with respect to t along the 
solutions of (E). 


TH. 


We will consider neural networks described by differen- 
tial equations of the form 
x= — H(x)(-Tx + S(x)-1) (L) 
where x =(X,°°7,x,)%(-1,1)", Xx =dx/dt, H is a 
function from (—1,1)" into R"”" (ie., for each x€ 
(-1,1), H(x) is an n Xn matrix), T=[T,,) is an nxn 
constant matrix, S(x) = (5,(%)),0° +5 5q(%,))” where 5,: 
(-1,1) 7 R and J=({,,-°-, 1)" is a constant real vector. 
In subsequent assumptions, we will impose restrictions on 
the functions H(x) and S(x) and on the matrix T. 


NEURAL NETWORK MODEL 


Remark: 3.1: In [1], Hopfield considers a continu- 
ous-—variable neural system which is realizable by electrica 
circuits and which is represented by the system of equa: 
tions 


C,(du,/dt) = ¥ Tju- 4, /Ri+ Fh, 
j=l 


i=le--,n (H 











rf 
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A(1, 1) = {x (-1,1): x, >0,-x,>0} 
= (0,1) x(-1,0) 
and so forth. 

Theorem 4.3: If system (L) satisfies Assumptions (A) 

and (B-1) and (C), then there is at most one (asymptoti- 
- cally) stable equilibrium of (L) in each of the 2” regions 
A(é,,++ +, &,) defined by (4.1). 

Corollary 4.1; If (L) satisfies assumptions (A) and (B-1) 
and (C) and if no (asymptotically) stable equilibrium of 
(L) has a O-coordinate, then the total number of (asymp- 
totically) stable equilibrium points of (L) is less than or 
equal to 2” (where a is the order of system (L)). 

Proof of Corollary 4.1: Since now all stable equi- 
librium points of (L) are in U A(é,,-+,&,) (.e., none of 
the stable equilibrium points are located on any axis in 
R"), the corollary follows from Theorem 4.3 and from the 
fact that there are 2” regions A(é,,°--,§,) in (~-1,1)” 
defined by (4.1). 

Proof of Theorem 4.3: The proof of this theorem is 
accomplished in several parts (Lemma 4.1, 4.2 and 4.3). 
We define the function F: (-1,1)">R", by F(x)= 
VE(x) = —Tx + S(x)- I. The derivative of F, DF: 
(-1,1)" > L(R", R"), is given by DF(x) = J,(x) = —T 
+ diag (s{(xp),-°+15y(4,))- 

We first consider the region A= A(1,--+,1) =(0,1)” 
(see (4.1)). 

By Assumption (C), for each i, the function defined by 
3}, = sf: (0,1) > (0, +00), where 6; = 5;(0), is invertible. 
Denote the inverse of s/, by p,=(s/,)7': (6, +00) > 
(0,1). Then both s/, and p; are C'-functions. Define 
the function P: [TT1%,(0,, +0) 7 A, by P(u)= 
(Py(m,), °° +, Pa(u,))?. We know that P is invertible and 
that both P and P7! are C!-functions. We let Q: 
T1".,(¢,, +00) > R” be defined by O(u) = F(P(u)) and 
we let 


D={x€A: DF(x) >0} and C=P"(D). 


Lemma 4.1: C is convex. 


Proof: For any y,z@C and for any A € (0,1), we ‘ 


must show that (1-A)y+AzEC. Since DF(P(y)) > 0 
and DF(P(z)) > 0, then 


(1-A)DF(P(y)) > 9 
and 
ADF(P(z)) > 0. 


Since for any uw €J1?_,(0,, +00), DF(P(u)) = —T + 
diag(u,,---,u,), we have 


DF(P((i~A)y +Az)) 
= —T+diag((1- A) y+ Az (LA) yt Aza) 
= (1-A)(-T+diag(1.°++ In) 
+(—T+diag(z,,--+.z,)) 
= (1-2) DF(P(y))+ADF(P(2)) > 0 
and therefore, (1—A)y + Az EC. This proves the lemma. 
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Lemma 4.2: There is at most one zero of the function F 
in region D. 

Proof: Suppose there is more than one zero of F in D. 
Then there exists y,z@C, y #z, such that Q(y)=Q(:) 
=0. By the Mean Value Theorem, there exists A€ 
(0,1) such that Q(y) = Q(z)+ DQ(z + M(y ~ 2) Y=). 
Therefore, DO((1— A)z + Ay) y ~ z) = 0. From the Chain 
Rule given by 

DQ(u) = DF(P(u))DP(u) 
for 
u€T1?,(6,, +00) 


we obtain 
DF(P((1—A)z +Ay))DP((1-A)z + AY)Cy~ 2) =0. 


Since for any 
u E117 4(0,, + 00), DP(u) = diag ( py'(uy)s°** Pa (un) 


and det(DP(u)) = T17,p/(u,) # 0, we 
det(DP((1— \)z + Ay)) #0. Also by Lemma 4.1. 


have 


det (DF(P((1-A)z+Ay))) #0. 


Therefore, y— z=0 which is in contradiction to the as- 
sumption at the outset of the proof. This establishes the 
lemma. 

Lemma 4.3: There is at most one (asymptotically) sta- 
ble equilibrium in A. : 

Proof: Suppose there are two asymptotically stable 
equilibrium points of (L), x, and x, in the region A. By 
Theorem 4.2, DF(x,)=J,(x,)>0, i=1,2. Therefore. 
X4,X_ € D. Since for /=1,2, A(x,)VE(x,) = 0 and H(x,) 
is nonsingular, F(x,)=VE(x,)=0. It follows from 
Lemma 4.2 that, x, = x2. This proves the lemma. 

Since we can prove Lemmas 4.1, 4.2 and 4.3 for any of 
the regions defined by (4.1) in an identical manner, the 
proof of Theorem 4.3 is complete. 


V. MAIN RESULTS: SYNTHESIS 


Suppose that we are given information in the form of a 
collection of specified vectors which we wish to store as 
equilibrium points of a neural network of the form (L), 
satisfying Assumptions (A) and (C). In the present section 
we will address questions of the following type: (1) Can 
the specified information (vectors) be stored as equilibrium 
points of system (L)? (2) If the answer to question (1) is 
affirmative, how can this be accomplished? (3) Can this 
information (vectors) be stored as asymptotically stable 
equilibrium points of the neural network? 

In the following, we assume that the functions S(x) 
and H(x), which satisfy Assumptions (A) and (C), for 
system (L) are given and that m (information) vectors 
{a,,---,a,,} C(—1,1)" are specified. We wish to de- 
termine an 1 Xn symmetric matrix T and an external 
input vector J © R" such that the vectors a,,°°+,4,, are 
equilibrium points of system (L). In doing so, we touch on 
the questions raised above. 


OX? 


Lemma 5.1: Given m vectors {4,754} C(-1,1)%, 
then the vectors a,,---, 4,, are equilibrium points of sys- 
tem (L), satisfying Assumptions (A) and (C), if and only if 
the unknown matrix f =T and the unknown vector B=l 
comprise a solution of the system of equations given by 

Ta, +B=S(a,), (5.1) 
where the matrix T = [,,] =[1,,] = T7 contains n(n + 1)/2 
unknowns and 8 =(f,,---,8,)” has n unknowns. 

Proof: 1) Suppose that 4,°+*,4@,, are equilibrium 
points of system (L), Then A(a,- Ta, + S(a,)- 1) = 0, 
i=1,--+,m. By Assumption (A), H(a,) is non-singular 
and thus Ta, +I =S(a,), i=1,:+-,m. Also by Assump- 
tion (A), T is symmetric. Therefore, T=T and B=/ 
constitute a solution for (5.1). 

2) Assume that T=T and B=I form a solution of 
(5.1). Then T is symmetric and H(a,;)(~ Ta, + S(a,)— 1) 
= 0, i=1,---,m. Therefore, when this particular choice of 
T and J is used, system (L) satisfies Assumptions (A) and 
(C) and the vectors a,,--+,a,, will be equilibrium points 
for (L). 

Equation (5.1) contains m Xn equations and n(n + 3)/2 
unknowns. Let us attempt to solve this system of equa- 
"os, 

Lemma 5.2: Given m vectors {4@,°+-,@,,.} €(-1,1)", 
then T=T and B=J comprise a solution of (5.1) if and 
only if T= 7 and 8 =/ are a solution of the equations 
TA=B (5.2a) 
B=~-Ta,,+S(a,) (5.2b) 
where A=[a,~4,)°°+)dm_7- 4, B= [S(a,)- S(a,,), 
“7, S(aq—1)~ S(a,,)] and T= [9,,] = [9,1 =T7 contains 
n(n+1)/2 unknowns and £=(f,,--+,8,)" contains n 
unknowns, 

Proof: Obvious, 

Lemma 5.2 enables us to determine T =T by solving 
(5.2a), and then B = J, by substituting T into (5.2b). 

Next, we consider a singular value decomposition for A 
given by 


i=1slm 


A=UXVT (5.3) 
where U is an nxn unitary matrix, V is an (m~1)x 
(y +1) unitary matrix (ie. U'=U-! and V7 = V~1), and 
2=[(4,)] is an nX(m—-—1) matrix such that all 9,, are 
equal to zero except o,,>0, i=1,---,k where k denotes 
the rank of A. 


Substituting (5.3) into (5.2a), we obtain T(UZV7) = B 

and 

(U™TU)Z=UTBY. (5.4) 
Since on the left-hand side of (5.4), the last (m—1-k) 
columns of (UTTU)S® are zero, then to insure a solution of 
(5.4), the last (m1-1—) columns of U7B V, on the right- 
hand side of (5.4), must also be zero. We thus have the 
following: 

Lemma 5.3: A necessary condition for (5.4) (as well as 
(5.2a)) to have a solution is that the last (m-1-k) 
columns of U7BY are zero. 

Suppose that the conditions of Lemma 5.2 are satisfied. 
Let the matrix formed by the first & columns of 5 be 
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where 2, = diag(o,,,--+,0,,), and the matrix formed by 
the first k columns of U™BV be denoted by 


q, 
Cc, 
where C, is a kXk matrix and C, is an (n—k)xXk 


matrix. Also, let 
| 0 | 


where E denotes the (n~k)X(n—- k) identity matrix, 
Next, we need to solve 


denoted by 


(5.6) 
(5) 


CQ ® 
promtup = pr) 6 


ZC, fe 
GQ ¥ 


é i (5.8) 


where ® is a kK X(n—k) parameter matrix and W is an 
(1 -—k)X(n—k) parameter matrix. 

Since P'UTTUP is symmetric, to insure a solution of 
(5.8), we require that 


2,C,= (2,C,) re Cy sf= cys, (5.9) 
and @=S;'C],¥= 7. We have thus arrived at the 
following: 

Lemma 5.4; A necessary condition for (5.8) (as well as 
(5.2a)) to have a solution is that 2,C, be symmetric. 


If both Lemmas 5.3 and 5.4 are satisfied, then (5.2a) is 
solved by 


T=uUQUT (5.10) 
where : 
-1 ott 
Q= Q2 (C,3; ) (5.11) 
O27! ¥ 


where ¥=[¥,,]=[$,]=¥7 is an (n—k)xX(n—k) ma 
trix which contains (n-—k\n—k +1)/2 parameters. We 
thus have the following: 

Lemma 5.5: a) Equation (5.4) (as well as (5.2a)) has a 
solution if and only if 


i) the last (m—1—) columns of UTBY are zero; and 
li) 2,C, is symmetric. 


b) when conditions i) and ii) are true, the equation -J 
(5.2a) is solved by the expressions (5.10) and (5.11) up to q 
(n—k)(n- k+1)/2 unknown parameters. 

Lemma 5.6: If the last (m-1—k) columns of U™BV 
are zero, then the following statements are equivalent: 

i) A™B is symmetric; 

it) 2,C, is symmetric. 

Proof: ATB is symmetric if and only if (VE7U7)B = 
BT(USVT), by (5.3), and this is true if and only if 
2™(UTBV) =V™BTUY = (Z™UTBV))’, and this is true if 
and only if the matrix 


ae al 
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is symmetric, (by (5.5) and (5.6)) and this is true if and 


* only if 2,C, is symmetric. This concludes the proof of the 
e lemma. 


_ We summarize Lemmas 5.1, 5.2, 5.5, and 5.6 in the 
following results: 

Theorem 5.1: Given m vectors {4,,-°-,4,,} C(-1,))", 
then: 

a) the vectors a,,---,a,, are candidates for equilibrium 
points of system (L), satisfying Assumptions (A) and (C), 


: ‘- if and only if 


(i) the last (m—1—) columns of UTBV are zero; 
(ii) ATB is symmetric, 


where A=[a,—4,,,:-+,@,,_,—4,,], k=rank of A, and 
B=[S(a,)- S(a,,),:++, S(a@,,-1)— S(a,,)| and U and V 
are given in the singular value decomposition of A, (see 
(5.3)). 

b) when part a) is true, we can synthesize a system (L) 
for which vectors a,,---,a,, are equilibrium points, by 
determining the symmetric matrix T and the input vector [ 
up to(n— k(n — k +1)/2 unknown parameters, using the 
relation 


T=UQU' and I = S(a,,)—Ta,, 
where 


-1 -1\T 
ga[Cr (G3) (5.11) 
Gyaeh ee 
where 31, C,, and C, are given in (5.5) and (5.6) and 
¥=(¥,]=[¥,]= 7, is an (n—k)X(n—k) parameter 
matrix. 

Corollary 5.1: Given m linearly independent vectors 
{4,,-++,a,,} ©(—1,1)", then: 

a) the vectors {a,,-++,@,,} are the candidates for equi- 
librium points of system (L), satisfying Assumptions (A) 
and (C), if and only if A7B is symmetric; 

b) when part a) is true, then the symmetric matrix T 
and vector / can be determined up to (n — m —1)(n — m)/2 
unknown parameters. 

Proof: Since now k=rank of A=m-—1 and since 
condition i) in part a) of Theorem 5.1 is true, the result 
follows. 

Remark; 5.1: 1) The synthesis procedure of Theorem 
5.1 will yield a,,---,a,, as equilibrium points. But in 
general, system (L) constructed in this way, may possess 
other equilibrium points. 

2) Once we have determined the matrix T and the 
vector J (by means of Theorem 5.1), we can use Theorem 
4.2 or other results (cf. [7]) to determine which of the 
equilibrium points aj,-+-,a,, are asymptotically stable. 
We have 

Lemma 5.7: Suppose that system (L) has been synthe- 
sized by Theorem 5.1, with equilibrium points 4°, ay 
Then a, will be asymptotically stable if 


Je(a,) = diag[s{(a™),---, 6/(a)]-T 


is positive-definite, where a{/) is the jth coordinate of the 
vector a,. 
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Proof: If Je(a,) is positive definite, by The Inverse 
Function Theorem, a, is an isolated equilibrium of (L). 
Then the lemma follows from the fundamental stability 
theory of Lyapunov (see proof of (3) = (4) in Theorem 
4.2). 

Lemma 5.8: Suppose that system (L) has been synthe- 
sized by Theorem 5.1, with equilibrium points a,,---,@ 
Let 


m 


a, = min { s/(a‘?): 
a{/) is the jth coordinate of a,,1< j< n} (5.12) 


Then a, will be asymptotically stable if 


A, <a, T=1,---,n 


where A,, 1</ <n, denote the eigenvalues of T. 

Proof: By Lemma 5.7, it suffices to show that J;(a,) 
>. Since T is symmetric, we know that there exists a 
unitary matrix P such that 


T = P'diag(Ay,--+,A,,)P 


where A,, /=1,+-+,n denote the eigenvalues of T. For any 
y &R", let z= Py, Since P is unitary, we have that 


Lye Le 
jal gal 
where y, and z, are the jth coordinate of y and :, 
respectively. Then 
n n 
yUe(a)y= ¥ s/(al?) y2— L Ajzj 


jal yok 
n n 
2 bees 
>a, yy a, 27 =0. 
jul pal 


Thus J;(a,) is positive definite. 

Theorem 5.2: Suppose that system (L) has been synthe- 
sized by Theorem 5.1, with equilibrium points a,,---, a,,. 
Then a, will be asymptotically stable if we can properly 
choose the (n-—k)X(n—k) symmetric parameter sub- 
matrix YW in the matrix Q (see (5.11) such that 


A, <a;, l=1,---,n 


where X,, 1 </ <n, denote the eigenvalues of Q and a, is 
defined by (5.12). 

Proof: By (5.10), T=UQU™, where U is a unitary 
matrix. By the properties of unitary matrices, T and Q 
have the same set of eigenvalues. Thus the lemma follows 
form Lemma 5.8. 

Remark: 5.2: Theorem 5.2 gives a relation between the 
stability of a, and the eigenvalues of the matrix Q. This 
suggests that we should determine the parameter matrix Y 
in such a way that the eigenvalues of Q are as small as 
possible. One easy way of doing this is to let ¥ = —a£, 
where a is a large positive number and E is the identity 
matrix. 

Remark: 5.3: By Lemma 3.3, Assumption (B-1) is true 
for system (L) for almost afl choices of the matrix T and 
the vector /. When Assumption (B-1) is true, by Theorem 
4.3, there is at most one asymptotically stable equilibrium 


O84 


of system (L) in each of the regions A(£,,---,&,) defined 
by (4.1). Thus if there are a,#a, in the given information 
vector set such that a, and a, are in the same region 
A(£,,--+,&,) then it may not be possible to store both a, 
and a, as asymptotically stable equilibrium points of 
system (L). 


VI. 


In the present section, we summarize the results of 
Section V and we demonstrate the applicability of our 
results by constructing a three dimensional neural network 
with specific (asymptotically stable) equilibrium points. 

We assume that the functions S: (—1,1)" > R” and H: 
(~1,1) > R"”" are given and satisfy Assumptions (A) and 
(C). Also, we assume that we are given m information 
vectors {a,,-++,@,,} CR” which are to be stored as 
(asymptotically stable) equilibrium points in a neural net- 
work of the form (L). Our goal, then, is to determine an 
n Xn symmetric matrix T and an n-dimensional vector I 
such that the (asymptotically stable) equilibrium points of 


{L) 


SYNTHESIS PROCEDURE AND AN EXAMPLE 


x= — H(x)(-Tx+S(x)-J) 


clude the vectors a,,---, Gg 

Summary of Procedure 

i) Check if there are a,#a, which are located in the 
same region A(é,,---, &,) defined by (4.1). If this is true, it 
may not be possible to store both a, and a |; a8 asymptoti- 
cally stable equilibrium points of (L). 

ii) Compute 


A= [4,~ gy On ~ On] 


B=[S(a,)— S(a,),-++, S(aq-1)- S(4q)] 
and 


= ATR. 


“i) Check if A'B is symmetric. If this is not true, 
4,,°**,4@,, Can not all be equilibrium points of (L). 

iv) Perform a singular value decomposition of A to 
obtain the matrices U, V, and & such that A=USV7, 
where U, V are unitary matrices and where © is a diagonal 
matrix with the singular values of A on its diagonal. (This 
can be accomplished by standard computer routines.) In 
doing so, we determine k = the rank of A which is equal to 
the number of nonzero diagonal elements of 3. 

v) Compute U7BV. 

vi) Check if the last (m—1-—) columns of U7BV are 
all zero. If this is not true, then a,,---,a@,, can not all be 
equilibrium points of (L). 

vii) Choose an (n~k)X(n—k) symmetric parameter 
matrix ¥. We may choose ¥ = — a£, where a is a large 
positive real number and E is the (n ~ k)X(n— k) iden- 
uty matrix. 
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vili) Compute 


Oe C,2;+ (esn)" 
C,2;3 v 





where C, is formed by the first k rows and first k columns 
of U™BY, C, is formed by the last (n — k) rows and first k 
columns of U"BV, and 2, is formed by the first k rows 
and first k columns of =. 

ix) Compute 


T=UQU" and 
T=S(a,,)—T(a,,)- 


x) Check if S(a,;)—T(a,)-1=0, i=1,-++,m. This 
should be true and a,,:--,a,, are equilibrium points of 
system (L). 

xi) Check if all of the eigenvalues of J,(a,) = 
diag[s/(a),--+,s4(a\”)|-T are positive, where a! is 
the jth coordinate of the vector a,, i=1,-++,m. If this is 
true, then a,,---,a@,, are asymptotically stable equilibrium 
points of (L). 

Example 6.1: Let n =3 and let function s:(-1,1) +R 
be given by 


5(p) = (1/A)(2m) tan ((1/2)p) 
where \ = 1.4. Then 
s'(p) = (1/4) [1/cos? ((1/2)p)], 
s"(p) = (m/X) sin ((1/2)p)/cos? ((m/2)p). 

Let S:(-1,1)? + R> be given by S(x) = (s(x) 
$(Xq), $(x3))7 and let H: (-1,1)? > R3%? be given by 
H(x) = diag(1/s’(x,),1/s'(x,),1/s'(x2),1/s(x3)- Then - 
both S and H satisfy Assumptions (A) and (C). 

Also, let m = 2 and let the desired equilibrium points for 
(L) be given by a,=(~0.6,0.5,-0.4)7 and a,= (0.4, 
0.8, ~0.7)7. 

We aim to find a 3X3 symmetric matrix T and a -3 
three-dimensional vector { for a neural network of the . 
form (L), having asymptotically stable equilibrium points 
a, and a). : 

In the following, we use the procedure given above, 3 
employing identical steps in our procedure: 

i) Since —0.6 <0 and 0.4>0, a, and a, are not in the -§ 
same region A(é,, 5, £,). Accordingly, it may be possible 3 
to store a, and a, as asymptotically stable equilibrium 
points of (L). 

ii) We compute A, B, and A™B as 

—1.0000E “n| 
A= 


—3.0000E -01 
3.0000£ — 01 

S(a,) = (—6.2588E — 01,4.5473E —O1, —3.3038E —01)" | 

S(a) = (3.3038E —01,1.3995E +00, ~8.9245E -01)" 


~9,5626E -01 
B= and 


-9.4478E -01 
A'B = [1.4083E +00]. 


and 


5.6028£ -01 










iii) Since A7B is a 1X1 matrix, it is symmetric. Accord- 
gly, 2, and a, may be equilibrium points of system (L). 
iv) A singular value decomposition of A, 4=USV7, 
Mm. yields 


5 ~9.20S7E-01 -2.7617E-01 2.7617E -01 
2 =| -2.7617E —01 9.6029E -01 3.9713E —02 
3 2.7617E —01 3.9713E -02 9.6029E -01 
1.0863E +00 
= 0 and 
P 0 
ve(1] 


‘a and the rank of A is equal to k =1. 
f° =v) Compute the matrix UTBY as 


1.2965E +00 
UTBV = | —6.2085E -01 
2.3814E —01 


vi) Since m -1-—k =2~1-1=0, nocolumns of U™BV 
need to be checked. Accordingly, a, and a, can be equi- 
librium points of (L). 

vii) Choose the parameter matrix ¥ as 


__,|1 0 
=. {5 | 
where «=10. 
viii) Compute the matrix'Q as 
1.1935E +00 -5.7154E-01 2.1923E —01 
Q=|-5.7154E-O1 -1,0000E +01 0 
2.1923E -01 0 -1.0000E +01 
ix) Compute T and J as 
~9.1608E —01 3.2827E +00 ~2.9584E +00 
T= 3.2827E +00 -8.8479£F +00 —1.0548E +00 
—2.9284E +00 -1.0548E +00 ~9.0425E +00 
and 
~4,0002 FE +00 
T=| 6.4264E +00 
—5.1950E +00 
x) Since 
0 
S(a,) - Ta, -1=| 2.22045 -16 and 
0 
0 
S(a,)-Ta,-I=|0 
0 


4, and a, are equilibrium points of system (L). 
xi) Since the eigenvalues of Je(a,), 


6.8059E —01, 1.1163 +01, 1.1550E +01 
are all positive, and since the eigenvalues of Jp (a), 
4.8305E — 01, 1.3276E +01, 1.7085E +01 


are all positive, the vectors 4, and a, are asymptotically 
stable equilibrium points of system (L). 
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VII. 


In this paper, we considered a class of neural networks 
which include the Hopfield model. For this class of Sys- 
tems we established a variety of results which enable us to 
perform a qualitative analysis of such systems. For this 
class of systems we also developed results which enable us 
to synthesize neural networks with specified asymptotically 
stable equilibrium points. Work in Progress aims to re- 
move some of the limitations of our analysis and synthesis 
procedures. ’ 

Most (but not all) of the present results are global. In a 
companion paper [7} we will be concentrating on local 
results. 


CONCLUDING REMARKS 
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NEURO-ORDINATEUR 







Connaitre en détail le fonctionnement du cerveau peut avoir des applications prati- 
ques. Les réseaux de neurones artificiels font actuellement Pobjet d’un regain d’inté- 
t aux Etats-Unis. Ces systémes qui stockent et retrouvent l’information de maniere 
«similaire » au cerveau sont particuliérement adaptés aux traitements en paralléle de 
problemes complexes comme la reconnaissance d’images ou de la parole. Quelques 
fapplications spectaculaires réalisées en laboratoire ont relancé la recherche universi- 
Btaire et de nombreuses sociétés américaines se lancent actuellement dans la commer- 
alisation d’applications industrielles. Les progrés récents des recherches dans le 
Omaine des topologies de réseaux, des algorithmes d’apprentissage et de l’implé- 
Ntation de circuits analogiques VLSI ont donné naissance au neuro-ordinateur. © 
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lia Modles théoriques qui ten- ~’de sortie si cette somme dépasse un seuil propriété: Is ont 
~Puquer, comment les cellules du i : 
" et leurs’ int connexions parvien- 

: -s 















un tel réseau est un systeme dynam: 
que programmable “qui ‘peut étre_ utilisé 



















(« Le cerveau et Pordina- 
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SEUIL SIMPLE 


Xo amptitude da 


SEUIL LOGIQUE 





var ivw 


I proventant du neurones 
poids » de br compe nent 
coefficient variable permettant de decaler te seuit 


crequtele) 4 


Fig. 1. ~ Le neurone est modélisé par une unité qui additionne les n entrées pondérées et transmet le résultat 
@ travers un seutl non-linéaire. Trois types de seuils sont présentés, 


teur» en février 86, et « Les mémoires as- 
sociatives » en mars 87). 

Ces capacités vont bien au-dela d'une 
simple exécution de calcul en paralléle 
grace 4 une propriété fondamentale : un ré- 
seau de neurones artificiels n’est pas pro- 
grammé & Vaide d’instructions, mais par 
Pexemple. 

La phase d’apprentissage consiste 2 pré- 
senter au réseau une série d’entrées, et a 
modifier les connexions du réseau pour 
qu’a chacune de ces entrées corresponde la 
sortie souhaitée. Dans le cas d’un systeme 
de reconnaissance de caractéres, par exem- 
ple, le signe digitalisé est appliqué en en- 
trée, et la couche de sortie produit le carac- 
tere identifié par le réseau. 

Liinformation est stockée de maniére 
distribuée dans les connexions du réseau. 
Ce type d’enregistrement permet de traiter 
Pinformation de maniére trés différente des 
ordinateurs conventionnels. 

Les principales propriétés de ces réseaux 
proviennent des phénoménes d’organisa- 
tion qui apparaissent durant l’apprentis- 
sage: le réseau effectue une classification 
automatique des connaissances, apprenant 
a distinguer, par exemple, une série de A 
d'une série de B. Il se crée une représenta- 
tion interne structurée du monde qui lui est 
présentée en entrée. 

Le réseau parviendra par la suite a effec- 
tuer le traitement pour lequet il a été en- 
trainé en identifiant Ventrée appliquée 
parmi les connaissances accumulées lors de 
Papprentissage et en produisant la sortie la 
plus vraisemblable. 

Cette capacité a apprendre par l’expé- 
rience est trés importante car elle permet 
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@utiliser de nouvelles techniques de pro- 
grammation, « Les réseaux de neurones peu- 
vent apprendre a effectuer des taches que nous 
ne sommes jamais parvenus @ programmer sur 
un ordinateur... des taches st complexes que 
nous sommes incapables de découorir |’algo- 
rithme qui les exécuterait », explique avec 
enthousiasme Robert Hecht-Nielsen, fon- 
dateur de la société Hecht-Nielsen Neuro- 
computer, de San Diego. « La compression 
d’image par réseau de neurones est déja supé- 
rieure aux meilleurs programmes conus par 
Vhomme ». Ce type de programmation ne 
Peut pas toujours étre utilisé : les neuro-or- 
dinateurs sont adaptés 4 des problémes de 
grande dimension od il s’agit de satisfaire 
simultanément des contraintes contradic- 
toires, c’est-a-dire les problémes ou les ré- 
gles de décision ne sont pas clairement dé- 
finies. Ils surpassent les ordinateurs 
conventionnels dans ces seuls secteurs. 

Comme ils privilégient les fonctions de 
perception sur celles de réflexion, et sont 
capables de s’adapter aux conditions varia- 
bles du monde extérieur, les réseaux de 
neurones artificiels devraient aider 4 créer 
une interface homme-machine plus natu- 
relle et doter les systtmes experts de plus 
de bon sens. 

Une telle approche peut donner nais- 
sance 4 de nouvelles formes d’intelligence 
artificielle: au lieu utiliser les régles 
qu’un expert semble utiliser pour prendre 
des décisions, ces machines peuvent ap- 
prendre par une série d’exemples. « Dans 
la plupart des cas, V’expert ne sait pas expli- 
quer les regles qui commandent ses décisions, 
explique Terrence Sejnowski, de Puniver- 
sité Johns Hopkins 4 Baltimore. -4/lez de- 
























mander & un champion de tennis commeng’ 
fait pour jouer ! Ceux qui utilisent des 13, 
sont précisément les non-experts ! » 

Les réseaux de neurones peuvent ay 
combinés aux systtmes experts pour amg 
Horer le choix de Ja prochaine régle 3 apply 
quer. 

Le plus étonnant est la rapidité avec | 
quelle des applications complexes peuven# 
étre développées: Terrence Sejnowski 4 

_congu en moins de trois mois une machine 
capable d’apprendre & lire & haute voix 

NETrtalk. Aprés une nuit d’apprentissage 

sur un texte de 1000 mots, elle avait leg 

performances dun enfant débutant, eff 
semblait suivre les régles de prononciationg 
des mots. re 

Le nom NETtalk n’a pas été choisi pa 
hasard : la firme DEC a en effet développé] 
un systéme de conversion de texte en pa4 
role baptisé DECralk. It a nécessité pl 
sieurs années de développement et repré-3 
sente une somme de recherches en 
traitement de la parole et en linguistique. 
Cette capacité 4 Pautoprogrammation de. 
vrait permettre de limiter les coiits crois- 
sants du logiciel dans certains projets. 2 

Les principales applications 4 étude, 33 
ce jour, en laboratoire comprennent des *% 
systémes de classification copiant les pro-% 
cessus de perception humains : reconnais-*. 
sance de caractéres, de formes ou de la pa: 
tole, stéréorecomposition automatique : 
Wimages, mais aussi des taches plus.direc- 
tement liées 4 la pensée comme des systé- ¥ 
mes experts de modélisation du diagnostic, 
médical ou de jeu (voir le backgammon). 

En robotique, cette approche pourrait 
rendre enfin les robots adaptatifs et régler 
les problémes de coordination des mem- 
bres et plus généralement tous les problé- 
mes multicapteurs adaptés 4 un environne- 
ment multivariable. 

Les réseaux de neurones se prétent éga- 
lement aux taches de transformation 
comme le codage/décodage de signaux 
temporels (avec apprentissage), les change- 
ments de repéres géométriques (carté- 
sien/polaire) ou la compression de signaux. 
Ils se rapprochent en ce sens des classifica- 
teurs statistiques couramment utilisés. 
Mais, comme Vexplique Richard Lipp- 
mann du Lincoln Lab du MIT: « Les tech- 
niques Statistiques traditionnelles ne sont pas 
adaptatives et imposent des hypotheses plus 
restrictives sur la forme des distributions. Les 
réseaux de neurones devratent étre plus robus- 
tes pour des distributions engendrées par des 
processus non linéaires ou fortement non gas 
Stents. » 

Un dernier domaine potentiel d’applic- 
tion concerne Poptimisation combinatroire: 
Cest-a-dire les problémes de programms 
uon de trafic agrien, ou le célbre probleme 
du voyageur de commerce : comment chor 
sir le plus court itinéraire dun voyagevt 
qui doit visiter plusieurs villes ? Pour 10 vil- 
les, il y a 181 440 itinéraires possibles, et u 
est possible de déterminer Je meilleur che 
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Fig. 2. — Traitement d'image sur Mark III. L’image d'un avion présentée en entrée du réseau est visualisée 
en haut a gauche. L’image de ce que le réseau « pense » voir en bas a droite (notez que les deux avions ne 

. sont pas ientiques). Le décalage et la rotation sont éliminés lors du traitement en haut, a droite, Des discri- 
minanis @ optimum sont alors unlisés (en bas & gauche) pour éffectuer Widentification. Cela illustre les 
differents blocs fonctionnels de traitement de information Pewrite 


* (Doc. TRW MEAD.) 
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Fig. 3. — Recon- 
gi "* Natssance de la pa- 
sgeerole sur Mark ‘ILI. 
4 m Spectre temporel 
* “mension 32 
fe du haut) 
brésenté en en- 
ree du réseau et 
‘aité pour produire 
reconnaissance 
me’ Phrases présen- 
AS en rouge en bas 
rotte. (Doc, 
WW MEAD.) 








€n calculant chaque distance. Mais 
30 villes, il y a plus de 103° itinéraires 
Une autre technique s’impose ! En 1982, 
Hopfield, un’ éminent physicien du 
mia Institute of Technology, mem- 
de PAcadémie des Sciences, a littérale- 
Fessuscité les recherches sur les ré- 
© Neurones en exposant dans une 
Sh on ton de PAcadémie des Sciences une 

B ‘iu, Probléme du voyageur de com- 
i Isant un modéle de réseau au- 





jourd’hui connu sous Pappellation du mo- 
déle d’Hopfield. 

Depuis Particle publié en 1982 par Hop- 
field, le renouveau des réseaux de neurones 
n’a cessé de se confirmer, et des articles 
dans la presse américaine ont contribué a 
alimenter l’intérét pour ce domaine. Qua- 
ure «newsletters » se sont créées cette an- 
née. De nombreux congrés ont eu lieu aux 
Etats-Unis, dont la premigre conférence in- 
ternationale sur les réseaux de neurones or- 


tables dans certains réseaux de neurones. 


ganisée par la société d’ingénieurs IEEE, a 
San Diego, du 19 au 24 juin 1987. 

Ce domaine attire de plus en plus des 
chercheurs d’origines variées : micro-élec- 
tronique, optique, mathématiques, neuro: 
sciences, biologie, informatique et psycho- 
logie. Des personnalités prestigieuses 
comme John Hopfield, Carver Mead de 
CalTech, ou des pionniers de la micro-in- 
formatique comme Federico Faggin tra- 
vaillent désormais sur ce sujet. 

Devant ces perspectives intéressantes, 
de nombreuses firmes ont investi dans la 
recherche longtemps reléguée au niveau 
universitaire. AT&T, IBM, Texas Instru- 
ments, TRW’ (fig. 2, 3 et 4), General Elec- 
tric, Motorola et méme Dupont de Ne- 
mours possédent leurs propres centres de 
recherche et ont signé des programmes de 
coopération avec les universités. 

Une demi-douzaine d’entreprises créées 
récemment par les pionniers du domaine 
se sont lancées dans le développement de 
Prototypes et, pour certains, dans la com- 
mercialisation d’applications. Ces « start- 
ups » sont financées par le capital-risque de 
la céte Ouest, toujours 4 Paffat de nouvelles 
opportunités, devenues rares dans linfor- 
matique depuis la stagnation du marché de 
la CAO et de PIA. Ces « startups » conser- 
vent des liens étroits avec leurs universités 
Porigine et se disputent les chercheurs re- 
nommeés. Au total, Panalyste Edward Ro- 
senfeld, qui publie a New York une « news- 
letter» sur la technologie des réseaux de 
neurones a identifié plus de 150 entrepri- 
ses impliquées dans la R & D sur les ré- 
seaux de neurones contre 20 il y a seule- 
ment deux ans (fig. 5). 

On distingue trois types d’approches 
adoptées par ces entreprises. 


La simulation par logiciel 
sur ordinateur classique 


Les réseaux de neurones peuvent se mo- 
déliser mathématiquement A Paide de ma- 
trices dont les éléments représentent les 
poids des connexions. L’apprentissage du 
réseau se fait alors en modifiant les élé- 
ments de la matrice suivant des régles don- 
nées. De nombreuses compagnies de taille 
modeste proposent des logiciels écrits gé- 
néralement en C ou en Pascal adaptés 4 
PIBM PC et méme au Macintosh. Cette ap- 
proche a longtemps été la seule offerte aux 
chercheurs souhaitant simuler des réseaux 


(fig. 6). 


La simulation logicielle 
sur machine spécialisée 
Afin d’accélérer les capacités de simula- 
tion logicielle, certaines firmes commercia- 
lisent des machines équipées de cartes spé- 
cialisées dans le calcul matriciel rapide et 
dotées d’une bibliothéque de programmes 
émulant les principaux types de réseaux. 
La firme californienne SAIC de San 
Diego propose ainsi une carte pour IBM 


MICRO-SYSTEMES - 87 





Fig. 4. — Exemple d'application sur 
dans fe temps (en haut @ gauche) qui p 
L ‘image reconstituee de ce que le réseau pense voir est Visua 
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1970 
(2) 


1975 
(10) 


ANNEES 
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(18) 


1985 
(20) 


1987 
(150+) 


DossIER 


Mark IIL Ce schéma illustre le traitement d'un spectre Doppler vanant 
ermet d'identifier les réflexions provenant d’un hélicoptére a 5 pales. 
Lisée en bas a droite. (Doc. TRW MEAD.) 


Fig. 3. - Crots- 
sance des applica- 
Hons commerciales 
des réseaux de neu- 
rones artificiels. 































PC/AT dotée d’un additionneur et q 
muitiplicateur rapide BIT organisés en ag! 
chitecture harvard avec un mégabit de més 
moire rapide. SAIC envisage d’adaptegd 
cette carte sur le Macintosh I dans un prod 
che avenir. Vendue 30000 $ (PC/AT in 
clus), elle est en compétition avec plusieurs? 
autres cartes pour PC, en particulier |e? 
« Neuro-ordinateur » Anza de Hecht-Niel- 
sen Neurocomputer (San Diego). 

De conception plus classique, la carte 
Anza, construite autour du processeur J 
68020 et vendue 15 000 $ (avec un PC Ze. 
nith Z-248 et une bibliothéque de program. 4 
mes), a été présentée a grand renfort de dé | 
monstrations plus ou moins convaincantes, 
lors du Congres organisé par PIEEE 4 San 
Diego en juin 1987. 

Le fondateur d’HNC, Robert Hecht- ¥ 
Nielsen, est un des pionniers des applica. | 
tions industrielles des NN. Il a dirigé le 
centre de recherche en intelligence artifi- 
cielle de TRW & Rancho Carmel en Cali. 
fornie, avant de créer sa propre entreprise, 

TRW, avec le Mark 3, fut la premiere 
compagnie & commercialiser un systéme en 
1986 : il est congu pour étre connecté 3 un 
ordinateur de la gamme Vax, et fut déve- 
loppé 3 Porigine pour le Département de la 
Défense. Produit trés remarqué lors de la 
conférence de San Diego, la carte Odessey 
fur développée a Porigine par Texas Instru- 

ents pour des applications de traitement 

>images temps réel. Vendue 15 0008, elle 
sinstalle sur la station de travail Explorer 
(50 000 $) et permet, selon Terrence Se}- 
nowski, d’exécuter le programme NETtalk 
15 fois plus vite que sur un Vax 780. 

Enfin, les calculateurs paralléles comme 
la Connection Machine de Thinking Ma- 
chine Corporation ou le transputer @IN- 
MOS, sil ne doivent pas étre confondus 
avec les réseaux de neurones, constituent 
dexcellents outils de simulation de ré- 
seaux. Programmé cet été sur une Connec- 
tion Machine, NETtalk tournait 200 fois 
plus vite que sur Vax 780 | 

Cette approche parait la meilleure 3 
court terme étant donné avancement de !a 
technologie et des recherches sur les ré- 
seaux. Elle combine en effet la rapidité 
d’exécution et la flexibilité qui permet de 
tester facilement de nouveaux modeles. 





L’implémentation directe 


Simuler des réseaux paralléles sur ordi- 
nateur séquentiel limite la rapidité d’exécu- 
tion et ne permet pas de profirer de la résis- 
tance aux pannes de ces structures. 

Implémenter directement le réseas 
Paide de processeurs physiquement 
connectés est le but de la plupart des con’ 
pagnies sur le marché. Sur ce point, deux 
approches sont en compétition : Vimplé- 
mentation électronique et ’implémentaue® 
optique. 5 

AT&T, TRW et de nombreux laborater 
res comme le Jet Propulsion Lab de ta 
NASA, 3 Los Angeles, développent des oI 
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Representation des zones du cerveau activées par la pensée pour une tache donnée, visualisant les relations entre ces diverses parties dans le cas d'un sujet sain (a 


4 bauche) et dans le cas d’un patient atteint de troubles neurologiques (a droite). (EEG Systems Laboratory, San Francisco). 
* 


1 Bs" 
cuits VLSI. AT&T teste actuellement un 
chip de 256 neurones inspiré du modeéle 
biologique de Hopfield composé de 25 000 { 
fransistors et de 100 000 résistances répar- 
Res sur 0,25 pouce carré. Les réseaux de 
ones se prétent facilement a Pintégra- 
don VLSI avec des motifs allant jusquw’a 

us, 4 comparer au 0,25 us courant 
- les mémoires classiques. Une nouvelle 
€ comportant 54 x 54 neurones est en 


@ File Edit Disptay Parameters Commands 








Memory Recall 


























































































S de conception. TRW a déja déve- Matrix 
PPE une dizaine de chips destinés 4 équi- | 
ez Mark IV, une nouvelle génération de 
Ordinateurs financée par la Darpa Observe 
me de recherche du Département 7 
7 enere Matrix #1 
Défense américaine). at 
ns ce domaine, une « start-up » attire i, 
culigrement Pattention. Créée en Hour: S| 
cre) eaptics regroupe de nombreuses 
sonnalités importantes comme le neuro- Column: |9 
ste Gary Lunch de PUniversity of 
fornia, Carver Mead, qui est un vatue: [-0.2 
ongvomme dans la conception de enue: 
ncucteurs au California Institute of 
Schnolo 








" BY, et Federico Faggin, qui a 
ic ena mucroprocesseur d’Intel Fig. 6. — Deux programmes de simulation sur Macintosh étaient présentés lors du congrés organsé par 
de nder Zilog. Sy: Maptics n’a pas en- EEF 3 San Diego : Mactivation (illustration) et MacBrain (vendu 99 dollars). Ces programmes consti- 
icp a Produits 3 commercialiser, mais a went une introduction originale aux diverses applications des réseaux de neurones (ict la reconnaissance de 
: Tous ses efforts de recherche caractéres). (Doc. Mike Kranzdorf) 
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pau Fegleg. de 
nombreuises excepto 
ROM Sere ia ar ba 


‘errence Sejnowski. 


a a 7 
‘onnaissant la chaine 
honemes ¢ 
t pu’ modifier, 
n util 


A drentrée, 
“ada fois afin de, 


de ‘sortie: (y héme 

* connexions variables:: 

*Aprés 12 heures d’apprentissage sur 

. mini-ordinateur Ridge, 32 Je programm: 
: « pronongait » correctement 95% des mot: 









veau: L’expérience utilisait un dictionnair 
| de-1000 mots. L’apprentissage @un di 
tionnaire de 20 000. mots. a nécessité- un 
semaine de calculi yj" FF: : 


En connectant un synthétiseur 


re 





parol 


en sortie du programme, i! produisait des, . 
paroles généralement- compréhensibles - 
évoquant de manitre troublante les erreurs ° 
légeres d'un enfant débutant. Le plus éton-.: 


nant est que le programme intégre P’intona- 

tion des mots et la séparation des syllabes. 
«Mais le programme ne comprend pas le 
texte », tient a préciser Sejnowski. .” : 
Cette machine, lors de sa présentation, a 
vivement impressionné du fait de la rapi- 
dité avec laquelle cette application a été dé- 
veloppée. ae : 

«Apres avoir démontré, Van dernier, la 
puissance de cette technique, je me suis atta- 
ché & comprendre comment ces phénoménes 
apparaissaient. » 

I Une analyse du fonctionnement de 
NETualk et en particulier une étude statisti- 
que des calculs réalisés au niveau des 80 
unités cachées lui a permis de mettre & jour 
les phénoménes d’auto-organisation du ré- 

t seau lors de Papprentissage. 
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Analyse du fonctionnement du réseau. Visualisation de la somme des signaux arrivant cn entrée des >> 
unités cachées lorsque l'on applique en entrée différentes chaines de caracteres qin produtsent fe phancr 
/E/, Cette représentation graphique est importante pour aider a saistr les phénomenes qui apparaisser bes 
de (utilisation du réseau. 
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par le programine NET1aik. On dis ingu unites” 
transmet. La couche d'entrée est divisée en sept « Capteurs » sous lesquels 4 
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dans la conception de circuits intégrés sur 
mesure. Elle a regu 3 sa création prés d’un 
million de dollars de diverses firmes de ca- 
pital-risque. 

Pour Pinstant, l’implémentation reste ti- 
mitée & des réseaux dont les « poids » de 
connexions sont fixés 4 Pavance, 

Ces approches semblent annoncer un re- 
tour des ordinateurs analogiques d’autre- 
fois avec cette fois-ci tous les atouts de mi- 
niaturisation développés durant ces vingt 
derniéres années pour les calculateurs digi- 
taux. ; 

Une implémentation plus révolution- 
naire pourrait étre optique. Les réseaux de 
neurones artificiels se prétent en effet tras 
bien au calcul optique grace aux multiplica- 
teurs optiques et aux hologrammes (voir 
sur ce point Particle de Claire Rémy dans 
Micro-Systemes de mars 1987). L’optoélec- 
tronique permet d’éluder de nombreux 
problémes qui se posent lorsque l’on tente 
@implanter sur du silicium la ptodigieuse 
interconnexion des réseaux de neurones ar- 
tificiels. L’équipe du professeur Nabil Fa- 
that, de Puniversité de Pennsylvanie, a dé- 
veloppé un syst&me dimagerie radar 
utilisant une mémoire adressable par le 
contenu (CAM en anglais) réalisée a Daide 
@un réseau de neurones artificiels opti- 
ques. Le radar et la CAM utilisent une li- 
brairie de caractéristiques d’avions et peu- 
vent identifier un élément de la librairie 
méme si 10% de Pinformation seulement 
est fournie par le radar. Des Prototypes ont 
également été construits par Dimitri Psaltis 
au Caltech et des compagnies comme 
BDM Corporation (MacLean, Virginie) ou 
Hughes Aircraft (Malibu, Californie). Tou- 
tes ces applications sont encore en phase de 
recherche, et, selon Clark Guest, de Puni- 
versité de San Diego, Pimplémentation op- 
tique ne s’imposera pas avant une dizaine 
dannées, lorsque tous les composants au- 
ront été développés. L’implémentation sur 
circuit VLSI est actuellement plus promet- 
teuse & court terme. 

Au-dela du choix entre Pélectronique et 
Poptique, le probleme Principal auquel se 
heurtent les équipes de recherche est le sui- 
vant : quelles structures implémenter ? On 
dispose actuellement de nombreux mode- 
les de réseaux et d’algorithmes dapprentis- 
sage, mais leur utilisation dépend de Pap- 
plication choisie. De plus, les limites des 
neuro-ordinateurs sont loin d’étre connues 
en détail. [I peut donc sembler prématuré 
de se lancer si vite dans une telle voie 
quand la recherche est encore en plein dé- 
veloppement. 

Néanmoins, apparition de ces nouvel- 
les machines va permettre le développe- 
ment rapide de toute une gamme d’applica- 
tions. 

«fl faut que Vindustrie trouve rapidement 
des applications et démontre les capacités des 
réseaux de neurones artificiels afin d'attirer 
les fonds de recherche», annonce Robert 
Hecht-Nielsen, pour qui Papparition @une 
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industrie de la neuro-informatique marque 
un pas important pour la reconnaissance de 
V’intérét de ce domaine de recherche. 

Le client principal est actuellement l’ar- 
mée, qui finance la majeure partie de la re- 
cherche sur les réseaux de neurones artifi- 
ciels. L’application spectaculaire évoquée 
par Hecht-Nielsen pourrait bien étre mili- 
taire puisque la plupart des projets financés 
concernent la «reconnaissance de me- 
nace ». Comme l’explique Bart Kosko, pré- 

lent de Verac Inc., la reconnaissance a 
des applications multiples: « A bord d’un 
cockpu, indiquer au pilote si le point sur 
Vécran est un ennemi, un ami ou un oiseau. 
Dans l’espace de l’IDS, distinguer les missiles 
des leurres. Sous la mer, détecter les sous-ma- 
rins et les bateaux parmi le bruit de fond et 
distinguer les mines des rochers. » 

De nombreux laboratoires travaillent sur 
ce projet, et plus généralement sur les pro- 
cessus d'intégration multicapteur et les sys- 
temes experts. 

Il n’existe pas de plan d’envergure natio- 
nale pour la recherche américaine sur ce 
sujet, mais de nombreuses agences gouver- 
nementales financent quelques projets et le 
total de leurs subvenuons atteint environ 
20 millions de dollars par an. Les Euro- 
péens, de leur cdté, avec le programme 
Brain lancé par la CEE, ont adopté une po- 
liuque de recherche plus organisée, 3 
Pimage des Japonais dont le programme 
Wordinateurs de sixigme génération sou- 
haite orienter les recherches vers des ma- 
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ORDINATEURS CONVENTIONNELS 


NUMERIQUE/TEMPS DISCRET 

Traite des informations codées en 0 et 1 pour 
la précision par commutation de portes logi- 
ques synchronisées par les pulsations d’une 
horloge. 


CALCUL SEQUENTIEL 
Un seul processeur traite séquentiellement 
quelques bits de données de la zone mémoire. 


MEMOIRE LOCALISEE 


Enregistre information dans une zone dé- 
dige a la mémoire. L’adresse physique per- 
met de retrouver facilement chaque donnée. 


LOGIQUE BOOLEENNE 
Prend des décisions OUI/NON basées sur 
des fonctions logiques. 


RESULTAT EXACT 
Trouve des réponses précises & un probleme 
dans des délais parfois prohibitifs. 


PROGRAMMABLE 

PAR INSTRUCTIONS 

Manipule les données de manitre structurée. 
Les opérations sont toujours sous contréle et 
les résultats prévisibles, Adapté a lexécution 
de taches séquentielles. Dur a programmer 
par expérience. 


SENSIBLE AUX PANNES 
MATERIELLES 

La défaillance d’un seul composant de la ma- 
chine peut avoir des conséquences catastro- 
phiques. 


Les différences entre ordinateurs conventionnels et neuro-ordinateurs. 7 
Les neuro-ordinateurs sont plus adaptés que les ordinateurs conventionnels a la résolution de problemes ge 
necessuent de satisfaire simultanément un grand nombre de contraintes. Contrairement aux ordinate 
conventionnels, as utilisent le traitement analogique et un parallélisme massif pour trowver de bonnes sol 
Hons rapidement. Ils favorisent la rapidité au profit de la précision. L'évaluation des limues et des capacity 
des neuro-ordinareurs est @ l'étude actuellement dans l'industrie et les untversités. De nombreux projets 54 
Jorcent de combiner les neuro-ordinateurs avec les technologies informatiques existantes pour utiliser effi 


cement leurs capacités complémentaires. 


La carte Anza, commercia- 
lisée par HNC, comporte 
un microprocesseur Moto- 
rola MC8020 operant a 
20 MHz, un coprocesscur & 
virgule flottante MC6888! 
er 4 mégabyres de RAM dy- 
namigue, Elle s'adapte sur 
un IBM PC/AT ou compa- 
tible, Le systéme, livré avec 
un logiciel de développe- 
ment, peut implémenter des 
réseaux de neurones com- 
portant jusqu'a 30000 
neurones et 480 000 inter- 
connexions et peut, selon 
FING, mettre @ jour 25 000 
connexions par seconde lors 
de Lupprentissage. (Photo 
Hechi-Nielsen Neurocom- 
puter, San Diego.) > 






























NEURO-ORDINATEURS 


ANALOGIQUE/TEMPS CONTINU 

Traite des informations codées par des 
gnaux analogiques continus, de basse prég 
sion, par transmission dans un réseau de prog 
cesseurs, en temps réel. 


CALCUL MASSIVEMENT PARALLELR 
Les unités de waitement interconnectées tra; 
tent toutes les données en méme temps. 


MEMOIRE ASSOCIATIVE 
DISTRIBUEE SUR LE RESEAU 
Enregistre information de maniére répartie 
par la modification des poids des connexioy 
du réseau. Chaque donnée rappelle automat 
quement les informations qui lui sont relises, x4 


LOGIQUE FLOUE g 
Prend des décisions pondérées & partir de 

données floues, incompletes ou contradictoi 
res, 


RESULTAT APPROCHE i 
Trouve rapidement de bonnes solutions ap- t 
prochées pour des problémes tres complexes, 3 


PROGRAMMABLE 

PAR L’EXPERIENCE 
Formule de manitre spontanée ses propres 

méthodes de traitement de information par 
auto-organisation lors de adaptation des 
connexions. Mal adapté 4 la programmation 
séquentille, car les récursions et les boucles# 
sont dures 4 implémenter en termes de ré& 
seaux. oe. 


TOLERANT VIS-A-VIS DES PANNES 
MATERIELLES q 
Les performances se dégradent graduelle- 1a 
ment en fonction des défaillances des compo 4 

sants, car information et le traitement sont ! 
distribués sur plusieurs unités. 
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Commencée dans les années quarante, 
f, sur la base de étude biologique du cer- 
Veau, la recherche sur les réseaux de nev- 

,fones artificiels s’est accélérée en 1949, 
B Stace 2 la mise au point par Frank Ro- 
senblatt, de Puniversité de Cornell, de ré- 
@ *caux baptisés perceptrons. Constimé 
Pune matrice de 400 photorécepteurs, le 
Perceptron pouvait reconnaitre des for- 
Rres simples. I} suscita a Pépoque de 
8rands espoirs, attirant:de nombreux 
Shercheurs dans cette voie. 

En 1959, Bernard Widrow de Stan- 
; ford Produisit une machine baptisée 
8 € (pour Adaptative Linear Net- 
vork) qui sera utilisée en 1963 pour la 
Onnaissance de la parole. 

1969, Marvin Minsky et Seymour 
Pers du MIT, inquiets du manque de 
Bee, <Cheurs en intelligence artificielle, pu- 

bom TENE une étude mathématique des 
eptrons qui démontra les sévéres li- 




















mitations de ces machines. L’influence 
de Minsky et habileté de la critique eu- 
rent pour effet immédiat de retirer la 
plupart des chercheurs dans ce domaine. 
Les deux approches étaient alors en com- 
pétition en terme de budgets de recher- 
che et ¢ il aurait fallu étre fou pour gacher 
Sa carriére dans les réseaux de neurones 
quand lintelligence artificielle offrait plus 
de promesses de délivrer des résultats rapi- 
des », expliquait Minsky. 

- Par la suite, seuls quelques irréducti- 
bles développérent de nouveaux modéles 
sur la base des travaux de Widrow sur les 
filtres adaptatifs non lingaires. « A I’épo- 
que, personne n’osait parler de ces réseaux 
comme des réseaux de neurones artificiels, 
commente Widrow, er nos efforts ne pro- 
voquatent que des sourires amusés. » 

Aujourd’hui, vingt ans de recherches 
en intelligence artificielle nous ont per- 
mis d’évaluer la force de la programma- 


chines plus inspirées des modeles biologi- 
ques des organismes vivants. 

Le renouveau des réseaux de neurones 
artificiels engendre parfois un enthou- 
siasme excessif, Beaucoup de chercheurs 
gardent en mémoire l’échec du perceptron 
dans les années soixante, et le brusque gel 
des recherches qui avait suivi une période 
@euphorie. « Les applications développées a 
ce jour dans l'industrie me font penser a des 
jouets pour adultes », tranche Clifford Lawes 
de l’Office for Naval Research, qui attribue 
un budget de 20 millions de dollars sur 
cing ans a la recherche accélérée sur les ré- 
seaux de neurones. Clifford Lawes n’a pas 
été impressionné par les démonstrations 
aux stands du congrés IEEE et s’explique 
ainsi : « Elles résolvent des problemes simples, 
de petite échelle, et l’on veut vous faire croire 
que cela démontre qu'un probleme plus com- 
plexe peut étre résolu. » Carver Mead, de 
Caltech, a également mis en garde la com- 
munauté des chercheurs contre cette atti- 
tude, courante chez certaines sociétés d’in- 
telligence artificielle. 

« Ces gens ne se rendent pas compte de la 
complexité des problemes en jeu: une expé- 
rience peut marcher merveilleusement bien en 
laboratoire mais ne sera pas sur le marché 
avant dix ans », explique Bernard Widrow, 
de Stanford, qui fut Pun des pionniers du 
domaine dés les années 60. 

Le comportement d’un réseau de 100 
neurones est certainement différent de ce- 
lui d’un réseau de 10 000. Le cerveau hu- 
main comporte plus de 10! neurones, et 
des phénomeénes statistiques supplémentai- 
res doivent apparaitre. De plus, i! est diffi- 
cile de déterminer a prion le nombre de 


LE RETOUR DES CONNEXIONNISTES 


tion symbolique mais aussi ses faiblesses 
dans certains domaines, et les chercheurs 
reviennent 4 cette filitre abandonnée. 
John Hopfield, auteur d°un rapport tres 
influent publié en 1982 par l’Académie 
des sciences américaine, aurait ramené 
80 % des chercheurs travaillant actuelle- 
ment sur le sujet. 

« L’étude du perceptron de Rosenblatt et 
des filtres adaptatifs de Widrow ont servi 
de base a la recherche, estime Terrence 
Sejnowski. Depuis de nouveaux types de 
réseaux qui dépassent les limites du percep- 
tron ont été mis @ jour, et des algorithmes 
d’apprentissage comme la rétropropagation 
de gradian (introduite en 1982) ont fait 
considérablement progresser les capacités de 
ces réseaux. » Pour ce nouveau départ, les 
chercheurs disposent désormais @’impor- 
tantes capacités de calcul et des progrés 
réalisés par la micro-€lectronique et Pop- 
toélectronique. 
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5. Gross, North Texas State University 
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Cliché réalisé par Uéquipe du professeur Gunter 
Gross de Uraaversité de North Texas State qui 
a culnve des cellules nerveuses sur une plaque 
de verre large de 1,8 mm, comportant 32 élec- 
trodes rnprimées en surface. Les cellules, qui 
proviennent de la moelle epiniere d'un embrvon 
de sourts, sont déposées une par une sur la pla- 
que de verre. En quatre semaines, elles forment 
un réseau bidimenstonnel et deticnnent sponta- 
nément aciives. Cette activité peut alors étre 
mesurée a Laide des électrodes. « Ces mesures 
suturent un disque Winchester de 300 mégaby- 
tes en moins de 2m», comunente Gunter 
Gross. Un nouveau projet, mené en collabara- 
fon avec la firme texane Martingale, va consis- 
fer a copter fa structure de ce réseau wvant 
dans Parchttecture d'un réseau de neurones artt- 
ficwls. (Photo G. Grass, North Texas State 
Cnaversity.) 
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neurones nécessaires pour résoudre un pro- 
bleme donné. 

Robert Dawes, mathématicien, fonda- 
teur de Martingale Inc., suggére de ne pas 
espérer trouver trop vite des applications. 
« Peu de chercheurs essayent de comprendre 
comment le réseau parvient au résultat. Per- 
Sonne ne va sulvre un ordinateur qui ne peut 
expliquer pourquoi il prend une décision. » 
L’intuition est une notion suspecte de la 
part d’une machine. « De plus, @ ce jour, 
personne n'est parvenu & développer des ma- 
chines qui apprennent sans professeur. » 

De maniére générale, il semble que le 
potentiel de ces recherches aille bien au- 
dela des applications 4 court terme déve- 





loppées par les industriels. « La principale 
contribution des réseaux de neurones artifi- 
ciels sera d’améliorer notre compréhension du 
fonctionnement du cerveau », insiste Clifford 
Lawes de PONR. Cela semblerait naturel 
dans la mesure ou ces réseaux artificiels ont 
été concus 4 lorigine pour modéliser les re 
seaux observables dans les tissus du cet 
veau. : 

De nombreuses équipes travaillent 3 afti- 
ner les modéles de réseaux permettant de 
décrire divers aspects de la perception hu 
maine. Le laboratoire de Los Alamos mene 
conjointement des recherches sur la repre: 
sentation du syst@me visuel de mammiferes 
et sur un modeéle du syst@me auditif. Gas 


Octobre 1787 


Lunch, de l'université de Californie a Ir- 
vine, a proposé un modele de lodorat. 
« Enudier de pres la nature nous fait beaucoup 


valle sur une modélisation détaillée de la 
réline en vue d’ameéliorer les performances 
des réseaux en traitement dimage et de 
mettre au point une « rétine » en silicium. 

La connaissance de la nature passe par 
Pétude détaillée des parties les moins com- 
plexes du cerveau humain, ou d’animaux 
cgmme fa grenouille ou le calamar. ATT 
étudie actuellement en détail les phénome- 
nes @apprentissage des odeurs chez Ja li- 
mace. 

De nombreux chercheurs estiment que 
la vraisemblance des modéles avec le cer- 
veau ou les propriétés de la perception hu- 
maine devrait servir de guide 4 la recher- 
che sur les réseaux artificiels. 

La «start-up» texane Martingale vient 
de remporter trois contrats de recherche at- 
tribués aux PME de haute technologie. 
Wun deux, intitulé Biomasscomp, est dé- 
| .eloppé en commun avec le professeur 
\ 


Gross de Yuniversité de North Texas State. 
Il consiste 4 tenter de connecter un réseau 
de neurones artificiels avec un tissu neuro- 
nal vivant dans une culture et 3 établir une 
: communication bidirectionnelle entre les 
, deux. Ensuite, grdce & des techniques avan- 
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/ cées d’optimisauon mathématque, cette 


communication permettra d’effectuer un 


, ‘transfert de Parchitecture du tissu vivant 
avancer», exphque Carver Mead, qui tra- / 
seront contrélables par le chercheur. Ce 


sur un réseau artificiel dont les paramétres 


projet illustre une utilisation des réseaux de 


\ Meurones artificiels comme un outil nou- 


veau dinvestigation pour les neuroscien- 
ces. 


Alan Gevins, du laboratoire de EEG Sys- 
tems en Californie, utilise un réseau de 
neurones pour analyser les signaux en pro- 
venance de 64 électrodes réparties sur !e 
crane d’une personne. « Cette technique per- 
met de mesurer des aspects importants de l'ac- 
tvué fonctionnelle entre les diverses parties 
du cerveau, précise Alan Gevins, une telle 
méthode devrait pouvoir étre utilisée prochai- 
nement pour le diagnostic er Vévatuation de 
patients atteints de troubles neurologiques ». 


«Ce domaine est actuellement plus multi- 
disciplinaire qu’inter-disciplinaire », déplore 
Joel Davis, qui dirige le programme de re- 
cherche de V’Office for Naval Research, et le 
terme « réseaux de neurones » n’a pas le mime 
sens pour tout le monde. » Ce sujet nécessite 
des structures de recherche inter-discipli- 
naires qui existent rarement et sont lon- 
gues 4 mettre en place. Le California Insti- 
tute of Technology a créé récemment un 









entre autonome, qui vient s‘ajouter aux 
entres de Boston University, du MIT, ou 
te Puniversité de Californie 4 San Diego. 
/ Une fois les structures de recherche en 
{place, il est délicat de faire travailler en- 
semble des spécialistes de cultures scientifi- 
ques différentes. « Mais dans le cas de Vélec- 
fronique et des neurosciences, le langage 
utilisé est similaire, et les origines sont les mé- 
mes, explique Joel Davis, n’oublions pas que 
Volt était biologiste et gu'Helmholtz était 
phystologiste ! » 

L’évolution des structures des recher- 
ches sur les réseaux de neurones artificiels 
et Pémergence dune industrie de la 
« neuro-informatique » semblent donc an- 
noncer une évolution rapide des connais- 
sances dans ce domaine et lapparition 
dune nouvelle génération d’ordinateurs. 

« Voila le cété merveilleux de ces recher- 
ches: on essaie de construire un cerveau ! », 
langait d’un air moqueur Bernard W'idrow 
a la fin du congrés de PIEEE. Widrow, 
comme beaucoup de chercheurs, partage 
Penthousiasme des industriels, mais me- 
sure la distance qui nous sépare des capaci- 
tés de traitement observées chez le plus 
simple étre vivant. Comme dirait Robert 
Dawes: « A ce jour, seul le Poete fait de la 
poésie !» C. Durand 

correspondant a Washington 





Avantages techniques 


— 64 octets disponibles 


Utilisations : 
— Location de progiciels 


— Contréle d'accés sélectif 
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— Utilisable directement par logiciel en lecture et en écriture 
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@ Polynomial 
ADALINE Algom 


| PADALINE ts a straightforward method 
of building and training neural networks 











Maureen Caudill 
Sophisticated and difficult prob- 
lems are often categorization prob- 
lems in disguise. Suppose we want 
to develop a system that intelli- 
gently decides whether or not the 
Smiths will be granted a mortgage. 
The system will have to classify the 
Smiths as either desirable or unde- 
sirable candidates for a home mort- 
gage. Suppose we want to design a 
system that will analyze spectrosco- 
pic data of a chemical compound 
and tell us its components. This is 
also a classification problem that 
classifies the unknown spectral pat- 
tern data, as from some assortment 
of known chemical ingredients. 
Such examples abound in the 
real world and include applications 
such as weather forecasting (is to- 
day’s weather state likely to lead to 
rain?), advanced radar systems (is 
this blip an enemy aircraft?), finan- 
cial applications (is this signature 
really Mary Johnson’s?), and indus- 
trial applications (does this bottle 
of shampoo meet our inspection 
standards?). There are so many pat- 
tern classification problems today 
that the research literature teems 
with ideas on how to solve them. A 
number of solutions exist based on 
an enormous number of different 
approaches. 


This article will address one par- 
ticular approach that has interest- 
ing and unique characteristics. The 
polynomial discriminant method, 
called the polynomial ADALINE 
or PADALINE, was developed by 
Donald Specht for his Ph.D. disser- 
tation at Stanford University, Stan- 
ford, California, in the mid-1960s. 

The idea behind the PADA- 
LINE is a variation of a technique 
used by Bernard Widrow (also of 
Stanford University) that he called 


the ADALINE (short for ADAp- 
tive LINear Element). 


Presenting the ADALINE 


The ADALINE is one of the 
simplest examples of a neural net- 
work. You may have heard some- 
thing about these computers based 
(sometimes very loosely) on our 
current understanding of the archi- 


tecture of the brain. All neural net-. 


work architectures are parallel 
computers with many small pro- 
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cessing nodes called neurodes. 
These nodes are highly intercon- 
nected to form a network of neur- 
odes, The ADALINE is one of the 
simplest such systems because in its 
most basic form it consists of only 
one neurode, 

Figure 1 shows what this might 
look like: a neurode with a large 
number of input signals impinging 
on it, along with a special input 
signal called the mentor line. The 
pattern elements are input into the 
ADALINE, with each element gO- 
ing in along one input line, Each 
input line has a weight associated 
with it that multiplies the incom- 
ing signal. These weight values de- 
termine which parts of the pattern 
the ADALINE pays the most at- 
tention to. 

The ADALINE works like this: 
a pattern is input along the input 
signal lines. Each element of the 
pattern is individually multiplied 
by the weight associated with that 
particular input line, The ADA- 
LINE adds up all of the weighted 
input signals except the signal on 
the mentor line, both positive and 
negative, and if the resulting 
weighted sum is less than some 
predefined threshold amount (usu- 
ally 0), the ADALINE outputs a 
— 1. If the weighted sum is greater 
than the threshold value, the 
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ADALINE outputs a +1. The case 
where the weighted sum equals the 
threshold can be decided either 
way, as long as it is done consis- 
tently; if the weighted sum equals 
the threshold, the ADALINE must 
always either output +1 or —1, 
but it cannot sometimes output +1 
and sometimes output — 1. 

What good is all this? Well, the 
ADALINE effectively has classi- 
fied the input pattern! Supposing 
the ADALINE is properly set up, 
then whenever it sees an input pat- 
tern sufficiently close to its model 
pattern, the weighted sum of the 
input signals will be larger than 0 
and the ADALINE outputs a +1, 
meaning that this is an example of 
the pattern. Otherwise, when the 
input pattern is not sufficiently 
close to the ADALINE’s correct 
pattern, it outputs a — 1, meaning 
that this input is not an example of 
the pattern, 

The ADALINE acts as a simple 
pattern classifier, sorting input pat- 
terns into one of two categories, 
such as “enemy plane” and “not 
enemy plane” or “rain today” and 
“no rain today.” We can also envi- 
sion that by collecting a number of 
these classifiers together into some- 
thing called a MADALINE (short 
for “Many ADALINES”) we could 
design a system that could, at least 
in theory, sort highly complex pat- 
terns into an arbitrary number of 
categories, 


Training the ADALINE 


But let’s go back to the problem 
alluded to earlier. Note that the 
correct classification of patterns 
will depend on getting the assorted 
weights properly set on the input 
lines. Only when we have an ap- 
propriate set of weights can we be 
assured that all examples will have 
weighted inputs that are properly 
either above or below the thresh- 
old value. 

How do we know what weight 
values to use so the ADALINE 
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sorts into categories we care about? 
This may sound strange to those 
who are unfamiliar with neural 
networks, but the answer to this 
question is that we érain the ADA- 
LINE to sort the patterns correctly, 

What I have not mentioned is 
that oearly all neural networks also 
literally learn to solve problems. In 
the ADALINE’s case, the algo- 
rithm used to set the weights is 
called the Delta rule. 

The Delta rule is simple to im- 
plement. We first gather a collec- 
tion of examples of patterns, some 
that should generate a +1 output 
and some that should generate a 
—1 output. We then split this col- 
lection into two parts, usually ran- 
domly, called the training and test 
patterns. The training patterns are 
used to set the weights using the 
Delta rule; the test patterns are lat- 
er used to confirm that the ADA- 
LINE operates properly when 
shown patterns it was not trained 
with. We will put the test patterns 
aside for the moment. Now we are 
ready to train. 

First the weights on the input 
lines (except the special mentor 
line, which has a fixed weight of 
+1) are randomly set to values be- 
tween —1 and +1. The ADA- 
LINE is then presented with each 
of the training patterns one at a 
time. Since we know what the cor- 
rect response is for each of these 
patterns, we use the mentor input 
line to tell the ADALINE the cor- 
rect response. Initially, the ADA- 
LINE ignores the mentor input 
signal and generates whatever ran- 
dom response the weights dictate 
for that pattern. 

This output is then fed back to 
the ADALINE so it can compare 
the actual output with the correct 
output specified by the mentor 
line. If the two agree, no weights 
are changed and the next pattern is 
presented, If the output was 
wrong, the weights are modified 
according to the Delta rule algo- 














rithm until the output is correct. 
Since the ADALINE is restricted 
toa +1 output, the output merely 
changes sign. 

The Delta rule is itself quite a 
simple algorithm. An error value is 
first computed as the difference be- 
tween the correct and actual 
output: 


E= Correct — Actual 


This error can have only three pos- 
sible values. If the actual and cor- 
rect outputs agree, the error is 0; if 
the actual output was — 1 and 
should have been +1, the error is 
+2; if the actual output was +1 
and should have been — 1, the er- 
ror is —2. Each weight » is then 
modified according to the 
equation: 
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where 8 is a constant between 0 
and 1, E is the error, and X, is the 
input pattern clement for the over- 
all pattern vector X, Each weight 
change is computed and the 
weights adjusted according to this 
rule. Listing 1 shows the ADA- 
LINE training algorithm. Usually 
the best value for the constant 8 is 
some value in the range of 0.3 to 
0.7, 

The idea of this training regimen 
is that in a well-behaved set of ex- 
emplar patterns, the weight 
changes after all our training pat- 
terns have been presented will be 
such that the ADALINE becomes 
more and more accurate at sorting 
the input patterns. If the training is 
sufficient, then using the test pat- 
terns (these are the ones we did not 
use to modify the weights, remem- 
ber) as inputs, without the weight 
change operation, should result in 
correct categorization of each of 
these unknowns, Listing 2 illus- 
trates this overall train-test- 





FIGURE 2. 


Nonlinearly 
separable problem 






train procedure. 

The only problem with the 
ADALINE arises from that cau- 
tionary phrase “‘a well-behaved set 
of exemplar patterns.” In fact, only 
when the pattern examples are 
mathematically linearly separable 
can the ADALINE be 100% accu- 
tate in its categorizations. Unfortu- 
nately, real-world problems are of- 
ten not linearly separable (Figure 
2). In these cases, this simple 
scheme will not work. 


The PADALINE ~ 


Specht’s PADALINE isa more! 
complex variation on the ADA- 
LINE in which the simple linear 
categorization capability is ex- 
tended into a polynomial categori- 
zation capability. The basic idea 
behind the PADALINE 1s that the 
simple linear decision surface of 
the ADALINE is replaced by a 
polynomial surface of arbitrary 
complexity so any categorization 


LISTING 1, 



















































ADALINE TRAINING ALGORITHM: 
Set learning constant, 8 (0 ( 6 ¢ 1, usually 8 = 0.5 or so} 
For each training pattern 
apply input pattern to adaline inputs and note expected output 
compute weighted sum of input vector components 
1f weighted sum > 0 
output = +1 
else 
output = —1 
compare output to expected output for this pattern 
if actual output * expected output 
Compute error in output (E © expected — actual) 
Adjust weights using Delta Rule until output changes 
end if 
do next pattern 
end if 
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USTING 2 


TRAINING THE ADALINE 


repeat 
train the adaline using the algorithe in Listing 1 
test the adaline using test patterns 

until (responses to test patterns are satisfactory) 












Polynomial ABALINE 


problem can be correctly learned. 
Furthermore, unlike other catego- 
rization techniques, the PADA- 
LINE model does not require that 
: all the training patterns be stored 
: in the system; the patterns can be 
presented one at a time, just as in 
the ADALINE, and then discard- 


ed. This characteristic is, of course, 


particularly useful when the num- 

ber of training patterns is very 

arge or when we do not know ex- 
: actly how many training patterns 
2 we will have. 


The PADALINE method re- 


tion. You may recall from college 
mathematics that polynomials can 
represent functions that are far 


tion, so it is easy to see that if we 
can generate a system using poly- 
nomial functions to separate the 
categories, it would have greatly 


: enhanced categorization character- 


istics. Figure 3 shows what the de- 

cision surface might look like for 

an arbitrary problem. 
Unfortunately, polynomials are 


places the simple, linear separating 
surface with a polynomial formula- 


more complex than the linear func- 


messy to describe in their fullest 
generality. The complete math- 
ematical description of the PADA- 
LINE, while fully understood by 
researchers, is beyond the scope of 
this article. However, their charac- 
teristics are of great interest, and 
they are amazingly good at difficult 
categorization problems. 

We will restrict ourselves for the 
moment to a discussion of a system 
that makes a simple, yes-no deci- 
sion, called the two-category deci- 
sion problem. In addition, we will 
initially restrict the discussion to a 
simple system that only has two in- 
put elements. For example, this 
might be the x-y coordinates of a 
robot arm for contro] purposes or 
the temperature and air pressure at 
a particular site to be used for 
weather forecasting. Later we will 
see how this might be extended for 
more complex input patterns. 

In the polynomial discriminant 
method, we generate a polynomial 
we will use to discriminate be- 
tween the two categories A and B. 
A polynomial is a series of terms, 
each of which consists of a con- 
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Possible decision surfaces 





stant multiplied by one or more 
variables raised to some integer 
power. For example, one polyno- 
mial might be: 


CaX? + CoXY + Co + Cy 2 0 


In this polynomial, the ¢ terms are 
the constants and the subscripts on 
each constant describe the power 
of x and y corresponding to that 
particular constant. Thus ¢x is the 
constant for the polynomial term 
with x raised to the second power 
and y missing; ¢,, is the constant for 
the term with both x and y raised to 
the first power, and so on. It 
should be clear that in a general- 
form polynomial, there could be an 
infinite number of such terms and 
a correspondingly infinite number 
of constants. 

When defining a polynomial 
function, we really only need to 
define the values of the constants. 
If a term is missing, the constant 
simply has a value of 0. If we know 
the value of each constant, we can 
completely specify the polynomial. 

Of course, in general, the poly- 
nomial can extend to infinitely 
many terms, so we cannot really 
compute the polynomial complete- 
ly. However, we do not need to do 
so since for nearly all reasonable 
cases we can truncate the polyno- 
mial after a few dozen terms and 
still solve our categorization 


problem. 


Defining the constants 


It turns out that, although they 
are a bit messy to write down, we 
know exactly how to define each of 
the polynomial constants for as 
many terms as we want to write 
down. Once we know this polyno- 
mial, we can use it to categorize 
any input pattern we want. Let’s 
look at the definition of the con- 
stants for our simple, two-dimen- 
sional input case before we go to 
the more general case. 

Before we write down the equa- 
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tions for the constants, let’s define 
a few terms. Recall that each input 
pattern X consists of two elements, 
X, and X,, and we have a collec- 
tion of such patterns that we will 
use as a training set. For each of 
these training patterns, we know 
which of the two categories that 
pattern belongs to and how many 
of the training patterns belong in 
each category. Let us suppose, for 
example, that we have a total of 
100 training patterns, and that 60% 
of them are in category A and the 
other 40% are in category B. 

We also want to define a couple 
of special terms that will make the 
notation a bit easier to understand. 
First, we designate a training pat- 
tern that belongs to the 4 category 
with the subscript .4, and one that 
does not with the subscript B. Thus 
there will be 60 patterns with 
the A subscript and 40 with the 
B subscript. 

There is a special ratio K, which 
we define as shown in Figure 4. 
We also define the term L, as the 
square of the length of the ith pat- 
tern. It is thus equal to: 


Lisi + xp 


Finally, we also define a smoothing 
parameter, which we designate by 
s. This parameter can be changed at 
will but is generally useful in the 
range of 0.5 to 10. 

The procedure is quite simple. 
We go through each of the training 
patterns, computing that pattern’s 
contribution to each of the polyno- 
mial constants as we go. Once we 
go through all the training pat- 
terns, we save the resulting polyno- 
mial constants. 

Listing 3 explains how this pro- 
cedure is performed in general, 
The saved polynomial constants 
can then be used to categorize 
some new, unknown patterns. This 
is done by simply computing the 
value of the polynomial for the un- 
known future pattern. If the poly- 







nomial’s value for that pattern is 
positive, the pattern is a member of 
category A, if not, it is not a mem- 
ber of category .4. 

Listing 4 illustrates this categori- 
zation procedure. All of this sounds 
simple enough, and in concept it 
really is, Unfortunately, however, 
the details of this algorithm can get 
a bit messy mathematically. The 
specific procedures are not particu- 
larly difficult, but it is somewhat 


intimidating to look at the formu- 
las. Don’t be intimidated; the pto- 
cedure is not at all difficult to im- 
plement in code. 

Clearly, the major thing left for 
us to define is the polynomial. As 
pointed out before, defining the 
constants for any polynomial com- 
pletely defines the polynomial it- 
self, so let’s look at how we com- 
pute those constants. We begin 
with the simplest constant, which 
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Computing the waiting factor, k 
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= —0.67 


(0.6) (2) 








USTING 2 


TO STORE THE PATTERNS USING THE POM: 
define smoothing constant. s 
Compute K for the training set 

for each pattern in the training set 


else /* pattern 1s a “B"*/ 


end if 
do next constant 
do next pattern 
when complete, save the values of the constants 


Compute the square of the length of the vector, L 
Compute the exponential term {exp(—L/2s?)] for the vector 
for each constant cij to be computed 
if this pattern 1s an example of category “A” 
add this pattern’s contributions to the first 
summation term in the constant formula 


add this pattern’s contribution to the second 
Summation term in the constant formula 








F ust « 


for each unknown pattern vector, x 


P(X) for that vector 
if P(X) 0 

Categorize pattern as “A” 
else 

categorize pattern as “8” 





end if 
do next unknom pattern 


TO CATEGORIZE AN UNKNOWN PATTERN USING THE POM: 


use the computed constants from Listing 3 to compute 

















Polynomial ADALINE 





multiplies the Oth power of both X 
terms. This constant is computed 
by the following equation: 


te 


Let’s walk through this briefly. 
There are » examples of category 
A in our training set, and # exam- 
oles of category B. We have com- 
Juted K. The terms L., and L, 
merely mean that we compute the 
length of each pattern vector as in- 
dicated, except that only patterns 
in the A category contribute to L, 
and only patterns in the B category 
contribute to the L,, terms, The i 
subscript just tells us which train- 
ing pattern we are currently using. 
As noted, sis the smoothing con- 
stant and exp is simply the math- 
ematical exponential function. For 
¢asy notation, we will just call the 
two exponential terms EXP, and 
EXP, If we do this, we can consid- 
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er the computation of this 
constant: 


Coomavg of EXP,)—K(avg of EXP,) 


We are merely taking the average 
of the exponential term over all the 
examples in the A category and 
subtracting a weighted average of 
the same term over all the exam- 
ples in the B category. 

That wasn’t so bad; let’s com- 
pute a few more constants (Figure 
5). We are using the EXP short- 
hand for the exponential term, and 
the subscripts 4 and B refer to pat- 
terns in categories 4 and B, respec- 
tively. These two terms are almost 
the same as the ¢» constant. There 
is a factor of 1 over the square of 
the smoothing constant in front of 
each; the only other difference is 
that the summations involve multi- 
plying the exponential terms by the 
first component of the appropriate 
input patterns, 

Let’s look at the general form of 
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Computing constants Cj, and Co, 
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Two-dimensional input factor formula 
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One vector’s contribution to Cry 
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each of these constants. We are still 
limiting ourselves to the two-di- 
mensional input case. Figure 6 
shows the formula to compute the 
value of any constant in the 
polynomial. 

It looks like a mess, doesn’t it? 
Remember that the constant sub- 
scripts z, and z, determine the cor- 
tesponding powers to which we 
raise the first and second compo- 
nents of the input vector, respec- 
tively. There are » examples of 4 
and » examples of B in our training 
set, The subscripts 4 and B on the 
Pattern components X merely re- 
mind us that the summations only 
sum terms from either the .4 or B 
Category, as appropriate. The 
smoothing factor s is raised to the 
24 power in front of the bracket, 
where 4 is the sum z, +2 Finally, 
remember that the EXP terms refer 
to the exponential terms we de- 
fined for each 4 or B pattern 
vector, 

Let’s do a specific example. We 
will compute the polynomial con- 
stant ¢,, and assume that we have a 
training set with 60% 4 patterns 
and 40% B patterns out of a total of 
100 patterns. We have computed K 
for this case and we know it equals 
—0.67. The constant 4 becomes: 


h=z+z014482 


Suppose we have an example input 
pattern in the A category with in- 
put vector components of (0.6, 
0.2), and we have a smoothing fac- 
tor of 3 for this system. What is this 
vector’s contribution to the con- 
stant ¢,,? First, since this pattern 
vector is in the A category, it con- 
tributes nothing at all to the second 
summation in the constant defini- 
tion formula; only B category vec- 
tors are added to that summation. 
The square of this vector’s length is 
(0.6)2 + (0.2)2 or (0.36 + 0.04) or 
0.40. 


Thus the amount we add to the 
first summation for this constant is 
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as shown in Figure 7. This is a very 
smail amount, as we can see from 
the exponential term. After we add 
all 60 such contributions from each 
of the A training vectors, we take 
their average value by dividing by 
60. We compute the same term for 
each of the 40 B vectors in the 
training set, take the average by di- 
viding the sum of these terms by 
40, and then weight the B result by 
the value 0.67 (K). After combin- 
ing the average A term and the 
weighted average B term, we mul- 
tiply by the scaling factor: 
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This is the value of the constant by 
for the PADALINE. 

We have limited ourselves so far 
to inputs with only two compo- 
nents. Most real problems, unfor- 
tunately, have many more than 
two-dimensional inputs. How do 
we use the PADALINE algorithm 
with these inputs? Actually, the 
procedure is identical to the proce- 
dure for the two-dimensional case. 
The only changes are that there are 
more than two subscripts in the 
constant designation, each of 
which is represented with a factori- 
al in the denominator of the scal- 
ing factor, and that the summations 
include each component of the in- 
put vector raised to its appropriate 
power. 

Figure 8 shows what the general 
form of the constant looks like. 
Here the key change has been to 
introduce a p-dimensional input 
vector, with a corresponding num- 
ber of subscripts and powers of the 
various components of the input 
vector. Other than this single 
modification, there is no qualita- 
tive difference between the two-di- 





mensional case and the general 
multidimensional case. 


Using the PADALINE 


The PADALINE algorithm has 
one obvious potential problem. Re- 
call that a polynomial can have an 
infinite number of terms and con- 
stants. How do we know how 
many constant terms to compute? 

Specht investigated this and con- 
cluded that it is rarely necessary to 
compute constants for terms higher 
than the quadratic (powers and 
constant subscripts greater than 2) 
and practically never are terms 
higher than the third power re- 
quired. He also found a more re- 
markable thing: when he comput- 
ed 90 or so constants for a 
particular problem, the majority of 
them were very small, with values 
close to 0. By setting these to 0, 
Specht greatly reduced the amount 
of computation needed for un- 
known pattern categorization, 
sometimes to only a few constants. 

In general, computing a larger 
number of constant terms will give 
the polynomial decision surface 
finer discrimination capability; but, 
of course, the more terms we com- 
pute, the greater the computational 
burden on the system. The most 
flexible results can be obtained by 
computing the maximum number 
of constants we can efficiently han- 
dle with the available hardware or 
memory, then discarding constants 
that are near 0 as unnecessary, 

Tt should be evident that the PA- 
DALINE algorithm is reasonably 
demanding computationally. If we 
look past the obvious complexity of 
the formulas, however, we find an 
algorithm that is both remarkably 
straightforward to implement and 
deeply powerful in its ability to 











Categorize unknown patterns. 
Since the PADALINE algorithm 
uses a generalized polynomial 
function to discriminate between 
categories, it can handle many 
more problems than the linearly 
separable problems of the ADA- 
LINE. Also, by computing addi- 
tional constants for the polynomi- 
al,a Progressively more accurate 
approximation to the decision sur- 
face is achieved, so we can achieve 
arbitrary levels of decision confi- 
: dence within the constraints of 
: time and computational capability. 
This algorithm also has the ad- 
: vantage that the training set data 
: can be processed one pattern vec- 
tor at a time, then discarded, while 
only retaining the current values of 
the various constants being com- 
puted. Furthermore, we only need 
to process cach training vector 
once, and as soon as this single pass 
through the file is complete, we are 
ready to handle unknown patterns, 


of unknown patterns is performed 
in a remarkably simple fashion, 
merely by computing the corre- 
sponding value of the polynomial 
function and then checking for a 

! Positive or negative value. This 

: simple decision process makes the 
: PADALINE especially attractive 
for time-constrained problems. @ 


Maureen Candill is president of Adap- 
tics, a neural network consulting and 

: taming company in San Diego, Calif, 

: which recently introduced the first com- 

$ puter-aided instruction system for nexral 
: networks. She is also coauthor of Natu- 

: rally Intelligent Systems: An Intro- 
: duction to Neural Networks (En- 
i baum Associates). 
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Finally, the actual categorization . 
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Simple “ Neural” Optimization Networks: 


An A/D Converter, Signal Decision Circuit, 


and a Linear Programming Circuit 


Abstract —We describe how several optimization problems can be rapidly 
solved by highly interconnected networks of simple analog processors. 
Analog-to-digital (A/D) conversion was considered as a simple optimiza- 
tion problem, and an A/D converter of novel architecture was designed, 
A/D conversion is a simple example of a more general class of signal-deci- 
sion problems which we show could also be solved by appropriately 
constructed networks. Circuits to solve these problems were designed using 
general principles which result from an understanding of the basic collec- 
tive computational properties of a specific class of analog-processor net- 
works. We also show that a network which solves linear Programming 

- problems can be understood from the same concepts, 





£ I. INTRODUCTION 


E HAVE shown in earlier work [1], [2] how highly 
interconnected networks of simple analog processors 
; can collectively compute good solutions to difficult optimi- 
; tation problems. For example, a network was designed to 
provide solutions to the traveling salesman problem. This 
problem is of the np-complete class [3) and the network 
‘could provide good solutions during an elapsed time of 
only a few characteristic time constants of the circuit. This 
computation can be considered as a rapid and efficient 
recontraction of the possible solution space. However, a 
if globally optimal solution to the problem is not guaranteed; 
% the networks compute locally optimal solutions. For the 
traveling salesman problem, even among the extremely 
Fs 800d solutions, the topology of the optimization surface in 
the solution Space is very rough; many good solutions are 
at least locally similar to the best solution, and a com- 
fePlicated set of local minima exist. In difficult problems of 
Bi nition and perception, where rapidly calculated good 
solutions may be more beneficial than slowly computed 
Belobally optimal solutions, collective computation in cir- 
MmCUits of this design may be of practical use. 

We have recently found that several less complicated 
Plimization problems which are not of the np-complete 
BSS can be solved by networks of analog processors. The 
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These networks are guaranteed of obtaining globally opti- 
mal solutions since the solution Spaces (in the vicinity of 
specific initial conditions) have no local minima. The A /D 
converter is actually one simple example of a class of 
problems for which appropriately constructed collective 
networks should rapidly provide good solutions. The gen- 
eral class consists of signal decomposition problems in 
which the goal is the calculation of the optimum fit of an 
integer coefficient combination of basis functions (possibly 
a nonorthogonal set) to an analog signal. The systematic 
approach we have developed to design such networks 
should be more broadly applicable. 

Fahlman [4} has suggested a rough classification of 
parallel-processor architectures based upon the complexity 
of the messages that are passed between processing units. 
At the highest complexity are networks in which each 
Processor has the power of a complete von Neumann 
computer, and the messages which are passed between 


individual processors can be complicated strings of in- 


formation. The simplest parallel architectures are of the 
“value-passing” type. Processor-to-processor communica- 
tion between local computations consists of a single binary 
or analog value. The collective analog networks considered 
here are in this class; each Processor makes a simple 
computation or decision based upon its analysis of many 
analog values (information) it receives in parallel from 
other processors in the network. Our motivation for study- 
ing the computational properties of circuits with this 
organization arose from an attempt to understand how 
known biophysical properties and architectural organiza- 
tion of neural systems can provide the immense computa- 
tional power characteristic of the brains of higher animals. 
In our theoretical modeling of neural circuits (1}, (2], {5}, 
[6], each neuron is a simple analog processor, while the rich 
connectivity provided in real neural circuits by the syn- 
apses formed between neurons are provided by the parallel 
communication lines in the value-passing analog processor 
networks. Hence, in addition to designs for conventional 
implementation with electrical components, the circuits 
and design principles described here add to the known 
repertoire of neural circuits which seem neurobiologically 
plausible. In general, a consideration of such circuits pro- 
vides a methodology for assigning function to anatomical 
structure in real neural circuits. 








Il. THE A/D ConvertER NETWORK 


We have presented in detail [1], {2], [5] the basic ideas 
involved in designing networks of analog processors to 
solve specific optimization problems. The general structure 
of the networks we have studied is shown in Fig. 1(b). The 
Processing elements are modeled as amplifiers having a 
sigmoid monotonic input-output relation, as shown in Fig. 

* 1a). The function v= 8,(u,;) which characterizes this in- 
put-output relation describes the output voltage V, of 
amplifier j due to an input voltage u,, The time constants 
of the amplifiers are assumed negligible. However, each 
amplifier has an input resistor leading to a reference ground 
and an input capacitor. These components partially define 
(see [1] and [5]) the time constants of the network and 
provide for integrative analog summation of input currents 
from other processors in the network. These input currents 
are provided through resistors of conductance 7, 7 COn- 
nected between the output and amplifier j and the input of 
amplifier i. In order to Provide for output currents of both 
signs from the same Processor, each amplifier is given two 
Outputs, a normal output, and an inverted output. The 
minimum and maximum outputs of the normal! amplifier 
are taken as 0 and 1, while the inverted output has corre- 
sponding values of 0 and ~1. A connection between two 
Processors is defined by a conductance T,; which connects 
one of the two outputs of amplifier j to the input of 
amplifier i. This connection is made with a resistor of value 
R,=1/|7, jb Cn Fig. i, resistors connecting 2 wires are 
schematically indicated by squares.) If T, |, > 0, this resistor 
is connected to the normal output of amplifier j. If 7, 4 <0, 
it is connected to the inverted output of amplifier 7. The 
matrix 7,, defines the connectivity among the processors. 
The net input current to any processor (and hence the 
input voltage u,) is the sum of the currents flowing through 
the set of resistors connecting its input to the outputs of 
the other processors, Also, as indicated in Fig. 1(b), exter- 
nally supplied input currents (/;) are also present for each 
Processor. In the circuits discussed here, these external 
inputs can be constant biases which effectively shift the 
input-output relation along the u, axis and/or problem- 
pecific input currents which Correspond to data in the 
problem. 

We have shown {5] that in the case of symmetric connec- 
tions (7), = T,,), the equations of motion for this network 
of analog processors always lead to a convergence to stable 
States, in which the Output voltages of all amplifiers remain 
constant. Also, when the diagonal elements (7,,) are 0 and 
the width of the amplifier gain curve (Fig. 1(a)) is narrow 
—the high-gain limit—the stable States of a network com- 
prised of N processors are the local minima of the quantity 
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We refer to E as the computational energy of the system. 
By construction, the state space over which the circuit 
operates is the interior of the N-dimensional hypercube 
defined by V,.=0o0r1. However, we have shown that in the 
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Fig. 1. (a) The input~output relation for the processors (amplifiers) 
Fig. 1(b). (b) The network of analog processors. The output of any 
neuron can potentially be connected to the input of any other neuron 
Black squares at intersections Tepresent resistive connections i) 
between outputs and inputs. Connections between inverted outputs 
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inputs represent negative (inhibitory) connections. 
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high-gain limit networks with vanishing diagonal conne 
tions (7;,)=0 have minima only at corners of this spac 
[5]. Under these conditions the stable states of the network 
correspond to those locations in the discrete space consis 
ing of the 2” corners of this hypercube which minimize 1 
(1). (Somewhat less restrictive conditions will often suffic 
which allow leeway for nonzero, T,;, Negative T;, do no 
necessarily cause problems.) : 4 
Networks of analog processors with this basic organizay 
tion can be used to compute solutions to specific optimi: 
tion problems by relating the minimization of the probleng 
cost function to the minimization of the E function of th 
network. Since the energy function can be used to defin 
the values of the connectivities (7,,) and input bias a 
rents (J,), relating a specific problem to a specific'z 
function provides the information for a detailed circug 
diagram for the network which will compute solutions 
the problem. The Computation consists of Providing 4 
initial set of amplifier input voltages u,, and then allowing 
the analog system to converge to a stable state whid 
minimizes the E function. The solution to the problem j 
then interpreted from the final stable State using a pre 
termined rule. : 
The A/D converter we shall describe is a Specific examam 
ple of such an optimization network. For clarity, we wid 
limit the present discussion to a 4-bit Converter. Its wiring 
diagram is shown in Fig. 2. The circuit consists of §} 
amplifiers (only inverting outputs are needed—see belo J 
whose output voltages will be decoded to obtain the outpyg 
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DIGITAL OUT 


2. The 4-bit A/D converter computational network. The analog 
sut voltage is x, while the complement of the digital word ¥,1,V,); 
ich is computed to be the binary value of x is read out as the 0 or ? 
ues of the amplifier output voltages. 


ary word of the converter, a network of feedback resis- 
- connecting the outputs of one amplifier to the inputs 
the’ others, a set of resistors (top row) which feed 
erent constant current values into the input lines of the 
olifiers, and another set of resistors (second row) which 
ct current onto the input lines of the amplifiers which 
proportional to the analog input voltage x, which is to 
converted by the circuit. For the present we assume that 
output voltages (V,) of the amplifiers can range be- 
en a minimum of 0 V and a maximum of 1 V. Thus as 
cribed above for the variables in (1), the V, range over 
domain (0, 1]. We further assume that the value of x in 
‘s is the numerical value of the input which is to be 
verted. The converter network is operating properly 


on the integer value of the binary word represented by - 


output states of the amplifiers is numerically equal to 
analog input voltage. In terms of the variables defined 
ve, this “riterion can be written as 


3 > 

Y V2! = x. 

i=0 
: circuit of Fig. 2 is organized so that this expression 
ays holds. 
‘he strategy employed in creating this design is to 
sider A/D conversion as a simple example of an opti- 
ation problem. If the word V;V,V,V, is to be the “best” 
ital representation of x, then two criteria must be 
illed. The first is that each of the V, have the value of 0 
L, or at least be close enough to these values so that a 
arate comparator circuit can establish digital logic levels. 
: second criterion is that the particular set of 1’s and 0’s 
sen is that which “best” represents the analog signal. 
s second criterion can be expressed, in a least-squares 
se, as the choice of V, which minimize the energy 
ction 


(2) 


B= s/s x va @) 


im0 
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because the quadratic is a minimum when the parenthe- 
sized term has a minimum absolute value. If this function 
is expanded and rearranged, it can be put in the form of (1) 
(plus a constant). There would, therefore, be a real circuit 
of the class shown in Fig. 1 which would compute by trying 
to minimize (3). 

However, with this simple energy function there is no 
guarantee that the values of V, will be near enough to 0 or 
1 to be identified as digital logic levels. Since (3) contains 
diagonal elements of the T-matrix of the form a(V,)? 
which are nonzero, the minimal points to the £ function 
(3) will not necessarily lie on the comers of the space, and 
thus represent a digital word (see [5]). Since there are many 
combinations of the V,; which can be linearly combined to 
obtain x, a minimum can be found which is not at a comer 
of the space. 

We can eliminate this problem by adding one additional 
term to the E function. Its form can be chosen as 


13 
-5 LOWY -0).- 
i=0 
The structure of this term was chosen to favor digital 
representations. Note that this term has minimal value 
when, for each i, either V,=1 or V,=0. Although any set 
of (negative) coefficients will provide this bias towards a 
digital representation, the coefficients in (4) were chosen so 
as to cancel out the diagonal elements in (3). The elimina- 
tion of diagonal connection strengths will generally lead to 
stable points only at corners of the space. The term (4) 
equally favors al! comers of the space, and does not favor 
any particular digital answer. Thus the total energy E 
which contains the sum of the two terms in (3) and (4) has 
minimal value when the V, are a digital representation 
close to x. 
This completes the energy function for the A/D con- 
verter. It can be expanded and rearranged into the form 


Ly (yy, 


2 j=0i*j0 


(4) 


E=w- 


3 
~ LF (-28M 42x). (5) 
im0 
This is of the form of (1) if we identify the connection 
matrix elements and the input currents as 


T= -2i+ 


uy 


I= (- 20-9 42ix), (6) 


The complete circuit for this 4-bit A/D converter with 
components as defined above is the network shown in Fig. 
2. The inverting output of each amplifier is connected to 
the input of the other amplifiers through a resistor of 
conductance 2'*/, The other input currents to each ampli- 
fier are provided through resistors of conductance 2' con- 
nected to the input voltage x and through resistors of 
conductance 2?‘-" connected to a —1-V reference poten- 
tial. These numbers for the resistive connections on the 
feedback network and the input lines represent the ap- 
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propriate relative conductances of the components and 
assume that the constant terms in the input currents are 
provided by connecting the input lines through resistors to 
a —1-V reference potential, that the minimum and maxi- 
mum output voltages of the amplifiers are to be 0 and 1 V, 
and that the analog input voltage to be digitized is in the 
range (—0.5,15.5) V. When building a real circuit, the 
values of the resistors chosen should satisfy the relative 
conductances indicated in the figure and in the above 
equations, but their absolute values will depend upon the 
real voltage rails of the amplifiers, the specific input volt- 
age range to be digitized, and reasonable values for the 
power dissipation. If the real output voltage range for the 
amplifiers is [0,V,,], the voltage range to be digitized is 
[0,V,,], and the reference voltage to be used for the con- 
stant input currents is —V,, then it is straightforward to 
show that the relative conductances (which must now only 
be scaled for power dissipation) for the feedback connec- 
tions are 
+) 


T, 
° Vee 





while the input voltage x will be fed into the ith amplifier 
through a resistor of conductance 2{4*/V,,, and the con- 
stant current is provided through resistors of conductance 
(20-9 +(2@-D7Y,)) connected to the —Vp reference 
voltage. 

The ability of the network to compute the correct digital 
representation of x was studied in a series of computer 
experiments and actual circuit construction. In the com- 
puter experiments, the dynamic behavior of the network 
was simulated by integration of the differential equations 
which describe the circuit (for details, see [1], [5]). The 
convergence of the network was studied as a function of 
the analog input x, for 160 different values contained in 
the interval —0.5 to 15.5 V. The digital solutions computed 
at a fixed value of x depend upon the initial conditions of 
the network. These initial conditions are defined by the 
input voltages (u,) on the amplifiers at the time that the 
calculation is initiated. In Fig. 3 is plotted the value of the 
binary word V,V,V,V, computed by the network as a func- 
tion of the value of (x+0.5) for the initial conditions 
u, = 0. The response is the staircase function characteristic 
of an A/D converter. In a real circuit, separate electronics 
which would ground the input lines of the amplifiers before 
each convergence would be required to implement the 
initial conditions (u; = 0) used in these simulations. If the 
input lines are not zeroed before each calculation, then the 
circuit exhibits hysteresis as the input voltage x is being 
continuously varied. For example, if x is slowly turned up 
through the same series of values used in the calculation of 
Fig, 3, but, instead of zeroing the input lines before a 
simulated convergence, we allow the u, to retain the values 
stabilized at the end of the previous calculation, we obtain 
the response shown in Fig. 4. Slowly turning down the x 
input from its maximum value would provide a response 
which is the “inverse” of Fig. 4. (The value for any x, in 
the experiment with x descending, is equal to 15.0 minus 
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Fig. 3. The digital word computed in simulations of the circuit shown 
Fig. 2 as a function of the analog input voltage x. The initial conditio 
for each of the calculations is u, = 0, for all i. 
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Fig. 4. The results of a calculation similar to that described in Fig. 
except that the initial conditions were determined by the u, whi 
stabilized during the previous calculation. Calculations were perform 
with monotonically increasing values of the analog input voltage 
Starting at x =0 V. : 


the value for (16.0— x) in the experiment with x ascenc 
ing.) Some stable states of the network are skipped und 
this set of initial conditions. 

One can understand this hysteresis, and its absence fc 
the u, = 0 initial conditions, by considering the topology « 
the energy surface for fixed x and how it changes as x | 
varied. In Fig. 5 is shown a stylized representation of th 
energy surface for two different x values. The energy : 
specific locations in state space is represented, with energ 
value along the vertical axis. Different corners of stat 
space near the global minimum in E£ (with value E = 0) ar 
indicated along the curve by the set of indices V,V,V,Vj. A 
shown in Fig. 5(a), the energy function for x =7 V has; 
deep minimum at the corner of state space which is th 
digital representation of 7, and has local minima at highe 
E values at the digital representations of 6 and 8. Al 
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ig, 5. A — ematic drawing of the energy surface in the vicinity of the 


global minima for two analog input voltages. 


‘ough, as shown above (Fig. 3), the circuit dynamics can 
ad from a location in state space corresponding to all 
,=0 to that corresponding to the deep minima, if x is 
ranged to 8 V while the u, are at the x = 7 V corner, then 
though the energy surface will change to that as shown in 
ig. 5(b), the system will remain stuck in the now-local 
inima at the corner corresponding to 7. However, if the 
Teuit is again allowed to compute from the initial condi- 
ons u,=0, but now with x=8 V, the correct deep 
imma can be obtained. The local minima are a direct 
sequence of the term (4) in the E function which forces 
output voltages to be digital. If this term were not 
‘esent, the V, will still represent a valid set of coefficients 
the linear approximation of the sum (2) to the analog 
ilue x, but the solution will in genera! not be at one of the 
omers of the solution space. 


TIT. THE DEcOMPOSITION/ DECISION PROBLEM 


Many problems in signal processing can be described as 
e attempt to detect the presence or absence of a wave- 
tm having a known stereotyped shape and amplitude in 
e presence of other waveforms and noise. Circuits which 
e similar to that described above for the A/D converter 
n be constructed for which the minimal energy state 
tresponds to a decision about this signal decomposition 
oblem. For example, consider the problem of decompos- 
g a time-dependent analog signal which results from the 
mporal linear summation of overlapping stereotype 
aussian pulses of known but differing width. A typical 
mmed signal is shown in Fig. 6(a). In Fig. 6(b) is shown 
e individual pulses which when added together give the 
snal in Fig. 6(a). The decomposition /decision problem is 
determine this particular decomposition of the signal in 
g. 6(a), given the knowledge of the individual stereotype 
rms. To make the problem specific, we assume that 
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Fig. 6. (a) Analog signal comprised of a linear summation of Gaussian 
pulses of different width and peak location. The pulses summed in (a) 
are explicitly illustrated in (b). 


N =100 time points of analog data (x(i); i,-+-, N) have 
been recorded, as indicated by the filled circles in Fig. 6(a), 
and that the set of basis functions defining the possible 
“pulses” in Fig. 6(b) are the Gaussian functions of the 
form 

€ (i) =e OP, (10) 
We will let the width parameter o take on a finite number 
of possible values, while the peak position () of the pulse 
can be at any one of the N time points. Since the basis set 
is specified by the width and peak position parameters, the 
amplifiers used in the decomposition /decision network can 
be conveniently indexed by the double-set of indices a, 1. 
In describing the decomposition, each of these basis func- 
tions will have a digital coefficient (V,,) which corresponds 
to the output of an amplifier in the network and which 
represents the presence or absence of this function in the 
signal to be decomposed. An energy function which defines 
an analog computational network and which is minimum 
when this decomposition /decision problem is solved is 


1 ome : 
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with the basis functions as defined in (10). This expression 
is of the form (1) and, therefore, defines a set of connection 
strengths (7,, ,.,-) and input currents (/,,) for each ampli- 
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A schematic diagram of this computational network is 
shown in Fig. 7. The signals x, enter the network in 
parallel (for a time-varying signal this could be accom- 
plished with a delay line) and produce currents in the input 
lines of the amplifiers through resistors which define the 
ith “convolution” component in the expression (13) above. 
A single resistor for each input connected to a reference 
voltage can provide the constant bias terms. 

The energy function presented above for a Gaussian 
pulse decomposition /decision circuit can be generalized. If 
€,; k=1,--+,n are a set of basic functions which span the 
signal space Xx, then consider the function 


are ges 
-Erealn 


This function describes a network which has an energy 
minimum (with £ = 0) when the “best” digital combina- 
tion of basis functions are selected (with V, =1) to describe 
the signal. The expression (7) can be expanded and re- 
pi to give 


B=5r 5 ( 


-z[or 


This is a function which is comprised of terms which are 
linear and quadratic in the V,’s. It is, therefore, of the form 
(1) (plus a constant), if we define 


Tye = — (By &) 
= [a+ pea] (9) 


Hence, for the general decomposition/decision problem 
mapped onto the computational network in Fig. 1, the 
connection strengths between amplifiers correspond to the 
dot products of the corresponding pairs of basis functions 
while the input currents correspond to the convolution of 
the corresponding basis function with the signal and the 
addition of a constant bias term. 

The A/D converter described earlier can be seen to be a 
simple example of this more general circuit. In the A/D 
case, the signal is one-dimensional and consists of only an 
analog value sampled at a single time point. The basis 
functions are the values 2"; 1 =0,-+-,(n—1) which are a 
complete set over the integers in the limited domain 
[0,2"—1]. The binary word output of the circuit is com- 
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Fig. 7. The general organization of a computational network which can 
be used to solve a multipoint decomposition problem with nonor- 
thogonal basis functions. ‘The outputs of each of the amplifiers repre- 
sents the presence (V, , =1) or absence (V, , = 0) of a pulse of a width o 
and peak location ¢ in the signal trace. 





prised of the coefficients which describe a linear summa- 4 
tion of the basis functions which is closest, in the least 
squares sense, to the input signal. 

For the A/D converter problem and the Gaussian de- 
composition/decision network just described, the basis 
functions which span the signal space are not orthogonal. 
For an orthogonal set, by definition, the connection 
strengths (9) would all vanish. For example, if the signal q 
consists of N analog-sampled points of a differentiable “g 
function, and the basis functions were sines and cosines (a # 
Fourier decomposition network), then the computational ? 
circuit would have no feedback connections since these 4 
basis functions are orthogonal. In this case, the indepen--3 
dent computations made by each amplifier are the con-% 
volution of the signal with the particular basis function % 
represented. This is just the familiar rule for calculating ; 7 
Fourier coefficients—all decisions are independent. In gen- 4 
eral, one can interpret the connections strengths in the § 
decomposition /decision networks as the possible effect of J 
one decision being tested (V,) on another (V;); these effects i 
should be zero for orthogonal basis functions. 


IV. THE LINEAR PROGRAMMING NETWORK 


attempt to minimize a cost function 


=AV 
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Constraints among the variables: 








a a DV>B, J=ly-+,M 
Be Dy, 
: D. 
Rn ss PN 
a: y=. (15) 


where the B,, for each j, contain the N variable coeffi- 
cients in a constraint equation and the B, are the bounds. 
“Although we know of no way to directly cast this problem 
: B into the explicit form of (1) so that a network of the form 
‘gage Shown in Fig. 1 could be used to compute solutions to the 
gy Problem, we can understand how the circuit in Fig. 8, 
illustrated for the specific case of two variables (N = 2) 
‘and four constraint equations (M = 4), can rapidly com- 
“id pute the solution to this optimization problem, by a varia- 

_ gem tion of a mathematical analysis used earlier [5]. 
can FF In the. circuit of Fig. 8, the N outputs (V,) of the 
jor. left-hand set of amplifiers will represent the values of the 
ho “alm variables in the linear Programming problem. The compo- 
t nents of A are proportional to input currents fed into these 
amplifiers. The M outputs (,) of the right-hand set of 
amplifiers represent constraint satisfaction. As indicated in 
MME the figure, the output (¥,) of the jth amplifier on the 
4 ‘ight-hand side injects current into the input lines of the V, 

















e : y variable amplifiers by an amount proportional to — Dj, 
ls B: the negative of the constraint coefficient for the ith vari- 
il, ‘able in the jth constraint equation. Each of the M Wy 
mn , amplifiers is fed a constant current Proportional to the jth 
al : bound constant (B;) and receives input from the ith vari- 
le * able amplifier by an amount proportional to D,,, Like all 
a : of the amplifiers in Fig. 1, each of the V, amplifiers in the 
id Z. linear programming network has an input capacitor C, and 
ec. , an input resistor p, in parallel, which connect the input line 


tag: (0 ground. The input-output relations of the V, amplifiers 

- are linear and characterized by a linear function g, in the 

' Bs. felation V,= g(u,). The , amplifiers have the nonlinear 
Fs. input-output relation characterized by the function 


Wj =f (uj), w= DVB, 
where 
z20 


f(z) =0, 
f(z) =~2, 


z<0. 


(16) 
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ie This function provides for the output of the y amplifiers to 
z, bea large positive value when the corresponding constraint 
, . quation it represents is being violated. (The specific form 
es _ of f(z) used here was chosen for convenience in building a 
fe. corresponding real circuit and the stability proof only 
=" depends upon f being a function of the variable z = DV 
~ B, (see below).) If we assume that the response time of 
the ¥,; is negligible compared to that of the variable 
amplifiers, then the circuit equation for the variable ampli- 
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Fig. 8. The organization of a network which will solve a 2-variable 
4-constant linear programming problem. 


fiers can be written 
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Now consider an energy function of the form 
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dF(z) 
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Then the time derivative of E is 


dE avi [u, i 
Laat at E072] (19) 
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dee R 
But, substituting for the bracketed expression from the 
Circuit equation of motion for the V, amplifiers (17) gives 


dE dV, du aV,\? 
= -— = “'(n)I —]. (20 
pet ke Los (v( i] (20) 


7 i dt dt 

Since C, is positive and g~'(V,) is a monotone increasing 
function, this sum is nonnegative and 

dE r dE ay, (21) 

—<0; —=95-— 

dt dt dt 
Thus as. for the network in Fig. 1, the time evolution of the 
system is a motion in state space which seeks out a minima 
to E and stops. The network in Fig. 8 should not show any 
oscillation even though there are nonsymmetric connection 
strengths between the two sets of V, and y, amplifiers, as 
long as the Ww, are sufficiently fast. 

A small computational network was constructed out of 

conventional electronic components to solve a 2-variable 





=0, for all 7. 
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Fig. 9. A plot of the measured values of x and y for the linear 
programming network described in the text, as a function of the gradient 
of the optimization plane. The set of gradients is depicted by their 
projections onto the x, y plane, drawn as vectors from the origin. 


problem with four constraints using the network organiza- 

n of Fig. 8. A simple op amp/diode active clamp circuit 
was used to provide the nonlinear f input-output function. 
The equations of constraint for the two variables (x and y) 
were 


yss 
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5 35 

Do 

5 35 
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These equations defined the connection strengths (D,,) and 
the input currents (B;) for the y, amplifiers. In the xy 
plane characterizing the solution space, they defined the 
simplex shown in Fig. 9, A microcomputer-based data 
acquisition system was used to control the circuit and to 
measure the output voltages of the V, and V, amplifiers 
which corresponded to the x, y solutions, as a function of 

didly changing sets of input currents supplied to the 
input lines of these amplifiers. As indicated in Fig. 8, these 
input currents correspond to the coefficients A, in the cost 
function which is to be minimized. For this simple 2-vari- 
able problem, the cost function can be geometrically 
thought of as a plane defined by the equation z+ A,x + 
A,y hovering above the xy plane, and the direction of the 
gradient of that plane 4,% + A, can be represented by a 
vector in the xy solution plane. The lowest point on the 
portion of this cost plane lying above the feasible solution 
space in the xy plane lies above the optimum simplex 
point. As the cost function is changed, the cost plane tilts 
in a new direction, the gradient projection in the xy plane 
rotates, and the optimum simplex point may also change. 
We recorded the values of x and y computed by the 
network for a set of cost functions. The operating points of 
the circuit are plotted in Fig. 9. Each diamond represents 














Fig. 10. The trajectory of x and y for the circuit described in the text as 
the gradient (indicated by the two vectors from the origin} is rapidly 
switched, 


the location in the xy space at which the network stabilized 
at as the cost-plane gradient vector (indicated by the array 
of short line segments emanating from the origin) was 
swept in a circle. The circuit was stable at the optimum 
simplex points corresponding to the correct constrained 
choice for a given gradient direction. 


In another experiment, the variable amplifiers were . : 


artificially slowed using large input capacitance and the 
trajectory followed by V, and V, was collected by rapid 
data sampling as the gradient was rapidly switched in 
direction. The trajectory is shown in Fig. 10. The network - 
follows the gradient until it reaches a constraint wali which - 
it then follows until the optimum simplex is reached. Since 
the solution space is always convex for linear programming 
problems, the network is guaranteed to find the optimum 
solution. 


V. CONCLUSIONS 


We have demonstrated how interconnected networks of . 
simple analog processors can be used to solve decomposi- 
tion/decision problems and linear programming problems. 
Networks for both problems were designed using concep- 
tual tools which allow one to understand the influence of 
complicated feedback in highly interconnected networks of 
analog processors. There appears to be a large class of 

" . . . u 
computation problems for which this simple concept of an + 
“energy” function generates a complete stable circuit de- 
sign without the need for a detailed dynamic analysis of 
stability. The function produces the required values of the 
many resistors from a short statement of the overall prob- 
lem. : 

The two basic computations— digital decomposition and ; 
the linear programming network—are quite different com- jy 
putations in several respects. In the decomposition /deci- 
sion networks discussed, the answers are digital, and this 
requirement that the stable states of the network lie on the : 
corners of the solution space determines the highly nonlin. 
ear input-output relations for the variable amplifiers. Also, ; 















“network are of no intrinsic relevance to the problem to be 
F solved: they are a program which is used to compute the 
B correct solution. In contrast, the amplifiers for the vari- 
Fables in the linear programming network are linear and 
@ furthermore the circuit equations of the linear program- 
& ming network (17) have a more direct relationship to the 
problem to be solved; the constraint relationships are 
explicitly represented. This is similar to conventional meth- 
F ods of analog computation in which the processing ele- 
F ments are chosen to compute specific terms in a differential 
B equation to be solved. In fact, a computational circuit 
‘similar to that in Fig. 8 has been described [7]. Here we 
, have analytically shown the stability of this circuit design 
and illustrate it as a limiting case of more general networks 
, for which the circuit equations do not necessarily relate to 
f the problem to be solved. Another distinction is that the 
F signal decision /deconvolution circuit makes a decision on 
the basis of the absolute values of its analog inputs, while 
the linear programming circuit decisions are based only on 
; 2 Felative values of the input amplitudes 4. This self-scal- 
% ing property is often desired in signal processing and 
pattern recognition. 

The practical usefulness of analog computational net- 
# works remains to be determined. Here, we have demon- 

* strated, that for “simple” computational tasks and well- 
. defined initial conditions, the networks can sometimes be 
guaranteed of finding the global optimum solution. The 
major advantage of these architectures is their potential 
; combination of speed and computational power [1]. Inter- 
esting practical uses of such circuits for complicated prob- 
lems necessitate huge numbers of connections (resistors) 
and amplifiers. Such circuits might be built in integrated 
circuit technology. Work has begun on questions of the 
microfabrication of extensive resistive connection matrices 
[8], [9]. Optical implementations of such circuits are also 
feasible [10]. 
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lished adenylate cyclase system. In hormone- 
dependent adenylate cyclase there is an as- 
semblage of individual components~recep- 
tor, GTP-binding protein, and catalytic 
moiety—for signal transduction (22, 23). In 
contrast, the presence of dual activities— 
receptor binding and enzymic—on a single 
polypeptide chain indicates that this trans- 


membrane protein contains both the infor- 


mation for signal recognition and its transla- 
tion into a second messenger. It is possible 
that a third signal component (probably a 
lipid or an accessory protein) is needed to 
link these two activities functionally. 

Note added in proof: Althowgh the anti- 
body to the 180-kD guanylate cyclase blocks 
guanylate cyclase activity, it does not inhibit 
the binding of ANF to the protein. This 
indicates that either the antibody is solely 
against the guanylate cyclase epitope of the 
protein or thar there are wo tightly coupled 
*°0-kD proteins which are inseparable by 
—¢ present techniques, 
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Technical Comments 


Computing with Neural Networks 


Hopfield and Tank (1) refer to “A new 
concept for understanding the dynamics of 
neural circuitry” using the equation (in a 
slightly different notation) 


du; 1 . 
CF = “Rt + 2-7, 604) +1; 
(= 1,..., 2) (2) 


for the neuron state variables 4; The con- 

_t is that the variables x;(t) approach 
equilibrium as t — = if the connections Ty 
are symmetric (Ty = Tj). Hopfield and 
Tank also state that “a nonsymmetric cir- 
cuit... has trajectories corresponding to 
complicated oscillatory behaviors .. . but as 
yet we lack the mathematical tools to manip- 
ulate and understand them at a computa- 
uonal level” (J, p. 629), and that “the 
symmetry of the networks is natural be- 
cause, in simple associations, if A is associ- 
ated with B, B is symmetrically associated 
with A” (J, p. 629). 

Associations are often asymmetric, as in 
the asymmetric error distributions arising 
during list learning (2). Neural network 
models (3) explain these distributions when 
one uses Eq. 1 supplemented by an associa- 
uve learning equation for the connections 

y 


aly 


ae ATs + Busfiey) (2) 
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Because of the nonlinear term #;fi(4)) in 
Eq. 2, Ty # Ty. 

Stability theorems (4) have been proved 
about neural networks which include and 
generalize Eqs. 1 and 2. Thus symmetry is 
not necessary to prove associative learning 
and memory storage by neural networks. 
Nor is symmetry needed to design stable 
neural networks for adaptive pattern recog- 
nition (5). Methods have also been devel- 
oped (6) for analyzing the oscillatory behav- 
ior of neural circuits. We believe that the 
relation between symmetry and stability in 
neural networks is much more subtle and 
better understood than Hopfield and Tank 
(1) suggest. 

Nonetheless, symmetry does help to. ana- 
lyze the system represented by Eq. 1. In fact, 
we (M.A.C. and S.G.) (7) independently 
discovered an energy function for neural 
networks “designed to transform and store 
a large variety of patterns. Our analysis 
includes systems which possess infinitely 
many equilibrium points” (7, p. 818), exam- 
ples of which have been constructed (8). 
These networks are 


® 


= a,(x,)[ bi) ~ > 6y4,() | 


dis; 
dt 


Given symmetric connections (¢j = cj), the 
energy function is 


va-2 iE bi GaN Ea + 
5, enehioy)dlon) (4) 
2 ann 

Along system trajectories 


eth 
geY = ~2anlsaalo) [bie4) ~ 


Zeenat) (5) 


d 
If a,(4;) = 0 and di(u;) = 0, then a =0, 


which is the key property of an energy 
function. We (M.A.C. and S.G.) have noted 
that “the simpler additive neural networks 
.-. are also included in our analysis” (7, p. 
819). The system represented by Eq. 3 
reduces to the additive network (Eq. 1) 
when au;) = C7 '\6,(u;) = — VR; 4; + In 


6y = —Ty and dj(s,;) = f(;). Then 
SE prehae S i Ais 
v=>3 [eres > defile) 
5 2 Tfileyfiloe) (6) 
2 yal 


which includes the energy functions used in 
(1). We (M.A.C. and S.G.) (7) also analyzed 
the more difficult and physiologically im- 
portant cases where the cells obey mem- 
brane, or shunting, equations and the signal 
functions 4,(#,) may have output thresholds. 
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Thus we consider the “new concept” in (1) 
to be a recent special case of an established 
neural network theory (7, 8, 9). 

Hopfield and Tank also assert that “Unex- 
pectedly, new computational properties re- 
sulted . .. from the use of nonlinear graded- 
Tesponse neurons instead of the two-state 
neurons of the earlier models” (J, p. 625). It 
has long been understood that two-state 
neuronal models differ computationally 
from graded-response models with sigmoid 
signal functions (6, 8, 10). 

The application of neural network theory 
to technology would be expedited by further 
consideration of known results (11). 
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Hopfield and Tank (J) present a neural 
model, a nonlinear feedback circuit, stating 
that it has a “natural” capacity for solving 
optimization problems. They amplify this 
idea with owo examples: the traveling sales- 
man problem (TSP) and the analog-to-digi- 
tal converter. 

It is recognized that neural functions and 
control mechanisms of most involuntary ac- 
tions are optimized during evolution and 
ontogenesis. This, however, is not the same 
thing as solving abstract optimization prob- 
lems. For instance, how good are human 
beings at solving the TSP? The main objec- 
tion to this model is that the neuropsy- 
chological functions are not sufficiently lo- 
calized and specified for this type of high- 
level semantic representation (2). In particu- 
lar, there are two “unnatural” features in this 
approach. First is the special meaning attrib- 
uted to each neuron, for example, in the 
TSP [figure § of (1)], “A given neuron 
(Vx,;) represents the hypothesis that city X 
is in position j in the solution.” Although it 
is emphasized that the TSP is a nonbiologi- 
cal problem, the most important process- 
¢s—formulation of the problem, generation 
of a circuit analogy, feeding of input infor- 
mation, and decoding of the outputs—are 
defined by the authors, not by the system. 
As a physical model, this network is there- 
fore incomplete for simulating biological 
computation. On the other hand, a circuit is 
already known (3) in which a topographic 
order corresponding to a certain degree of 
semantical differentiation will be formed by 
self-organization. 

The second “unnatural” feature of their 
approach is the dedicated architecture or 
connectivity berween the “neurons” which 
must be tailored for every problem. The 
authors apply hindsight in designing the 
model: by conjecturing the form of the 


energy function corresponding to the svs- 
tem structure and parameter values or by 
computing them backward from the output 
state, they end up with the solution to the 
problem. This is especially apparent in the 
analog-to-digital converter example. 

The authors also state thar their network 
implements associative memory in a “natu- 
ral” fashion. This was in fact the main result 
of their original work (4), Sumilarly, in their 
recent article the network interconnectivity 
is assumed a priori to be proportional to the 
correlation matrix of the wanted state vec- 
tors. If this network were to implement a 
genuine associative memory with a physical 
mechanism for both storage and recall, the 
network structure of (4) could be used (5), 
bur the couplings should be formed adap- 
uvely, relating to the input and ourput 
signal values actually occurring all the time. 
Such a process then needs additional mathe- 
matical analysis (5, 6). 

Depending on the nature of interconnec- 
tivicy, the output state of such a feedback 
system may then relax to the linear range (6) 
or to saturation (1, 3, 4, 7). Learning or 
adaptive effects seem to take place in the 
former (“linearized”) mode, while no leam- 
ing appears to have been involved in (2) or 
(4). 
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Hopfield and Tank (J) review the use of 
networks for computation. Many current 
investigations are based on Hopfield’s ob- 
servation (2) that the asymptotic behavior of 
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Computing with Neural Circuits: A Model 


Joun J. HOPFIELD AND Davip W. TANK 





A new conceptual framework and a minimization princi- 
ple together provide an understanding of computation in 
mode! neural circuits. The circuits consist of nonlinear 
graded-response model neurons organized into networks 
with effectively symmetric synaptic connections. The neu- 
rons represent an approximation to biological neurons in 
which a simplified set of important computational prop- 
erties is retained. Complex circuits solving problems 
similar to those essential in biology can be analyzed and 
understood without the need to follow the circuit dynam- 
ics in detail. Implementation of the model with electronic 
devices will provide a class of electronic circuits of novel 
form and function. 





COMPLETE UNDERSTANDING OF HOW A NERVOUS SYSTEM 
computes requires comprehension at several different levels. 
arr (1) noted that the computational problem the system 
is attempting to solve (the problem of stereopsis in vision, for 
example) must be characterized. An understanding at this level 
requires determining the input data, the solution, and the transfor- 
mations necessary to compute the desired solution from the input. 
The goal of computational neurobiology is to understand what 
these transformations are and how they take place. Intermediate 
computational results are represented in a pattern of neural activity. 
These representations are a second, and system-specific, level of 
understanding. It is important to understand how algorithms— 
transformations between representations—can be carried out by 
neural hardware. This understanding requires that one comprehend 
how the properties of individual neurons, their synaptic connec- 
tions, and the dynamics of a neural circuit result in the implementa- 
tion of a particular algorithm. Recent theoretical and experimental 
vork attempting to model computation in neural circuits has 
provided insight into how algorithms can be implemented. Here we 
define and review one class of network models—nonlincar graded- 
Fesponse neurons organized into networks with effectively symmet- 
ric synaptic connections—and illustrate how they can implement 
algorithms for an interesting class of problems (2). 

Early attempts to understand biological computation were stimu- 
lated by McCulloch and Pitts, who described (3) a “logical calculus 
of the ideas immanent in nervous activity.” In these early theoretical 
studies, biological neurons were modeled as logical decision cle- 
ments described by a two-valued state variable (on-off), which were 
Organized into logical decision networks that could compute simple 
Boolean functions. The timing of the logical operations was con- 
trolled by a system clock. In studies of the “perceptron” by 
Rosenblatt (4), simple pattern recognition problems were solved by 
logical decision networks that used a system of feed-forward synap- 
tic connectivity and a simple learning algorithm. Several reviews of 
McCulloch and Pitts and perceptron work are available (5). More 
recent studies have used model neurons having less contrived 
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properties, with continuous dynamics and without the computerlike 
clocked dynamics. For example, Hartline et al, (6) showed that 
simple linear models with continuous variables could explain how 
lateral inhibition between adjacent photoreceptor cells enhanced the 
detection of edges in the compound eye of Limulus. Continuous 
variables and dynamics have been widely used in simulating mem- 
brane currents and synaptic integration in single neurons (7) and in 
simulating biological circuits, including cencral pattern generators 
(8) and cortical structures (9). Both ewo-state (10, 12) and continu- 
ous-valued nonlinear models (12) have been extensively studied in 
networks organized to implement algorithms for associative memo- 
ries and associative tasks (13). 

The recent work being reviewed here has been directed toward an 
understanding of how particular computations can be performed bv 
selecting appropriate patterns of synaptic connectivity in a simple 
dynamical model system. Circuits can be designed to provide 
solutions to a rich repertoire of problems. Early work (10) was 
designed to examine the computational power of a model system of 
two-state neurons operating with organized symmetric connections. 
The inclusion of feedback connectivity in these networks distin- 
guished them from perceptron-like networks, which emphasized 
feed-forward connectivity. Graded-response neurons described by 
continuous dynamics were combined with the synaptic organization 
described by earlier work to generate a more biologically accurate 
model (14) whose computational properties include those of the 
earlier model. General principles for designing circuits to solve 
specific optimization problems were subsequently developed (/5- 
17). These networks demonstrated the power and speed of circuits 
that were based on the graded-response model. Unexpectedly, new 
computational properties resulted (15) from the use of nonlinear 
graded-response neurons instead of the two-state neurons of the 
earlier models. The problems that could be posed and solved on 
these neural circuits included signal decision and decoding prob- 
lems, pattern recognition problems, and other optimization prob- 
lems having combinatorial complexity (15-20). 

One lesson learned from the study of these model circuits is that a 
detailed description of synaptic connectivity or a random sampling 
of neural activity is generally insufficient to determine how the 
Circuit computes and what it is computing. As an introduction to the 
circuits we review, this analysis problem is illustrated on a simple 
and well-understood model neural circuit. We next define and 
discuss the simple dynamical model system and the underlying 
assumptions and simplifications that relate this model to biological 
neural circuits. A conceptual framework and minimization principle 
applied to the model provide an understanding of how these circuits 
compute, specifically, how they compute solutions to optimization 
problems. The design and architecrure of circuits for two specific 
problems are presented, including the formerly enigmatic circuit 
used earlier to illustrate the analysis problem. 
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Understanding Computation in a Simple 
Neural Circuit 


Let us analyze the hypothetical neural circuit shown in Fig. ] with 
simulation experiments based on the tools and methods of neuro- 
physiology and anatomy. The analysis will show that the usual 
available neurobiological measures and descriptions are insufficient 
to explain how even small circuits of modest complexity compute. 
The seven-neuron circuit in Fig. 1 is designed to compute ina 
specific way that will later be described. From a neurobiological 
viewpoint, the basic anatomy of the circuit contains four principal 
neurons (21), identified in the drawing as Po, Pi, Pa, and P3, Each 
neuron has an axon leaving the circuit near the bottom of the figure. 
The computational results of the circuit must be evident in the 
activity of these neurons. The one input pathway, from a neuron 
external to the circuit, is provided by axon Q. Neurons IN;, IN2, 
and IN; are intrinsic interneurons in the circuit. 

In attempting to understand the circuit’s operation, we simulta 
neously monitor the activity (computer simulated) in each of the 
seven neurons while providing for a controllable level of impulse 
activity in the input axon Q. Results from this experiment on the 
hypothetical circuit for several fixed levels of input activity are 
¢ ymin Fig. 2A, The top trace represents our controlled activity in 
Q. In each time epoch this activity is progressively larger, as 
illustrated by the increasing number of action potentials per unit 
time. Although the activity of INs is steadily rising as the activity in 
Q increases, the activities of the other neurons in the circuit are not 
simply related to this input. From these results we know what the 
output patterns of activity on the principal neurons are for specific 


Q 














Fig. 1. “Anatomy” of a simple model neural circuit. Input axon Q has 
excitatory synapses (direct or effective) on cach of the principal neurons Po 
through P,. Each of these principal neurons has inhibitory synapses (direct 
or indirect) with all other principal ncurons. Inhibitory synapses are shaded. 
IN, to IN;, intrinsic interneurons. 
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levels of impulse activity on the input axon Q, but we cannot explain 
what computation the circuit is computing. Furthermore, we do not 
know how the structure and organization of the circuit has provided 
these particular patterns of neural activities for the different input 
intensities. 

Study of the synaptic organization of the connections between the 
neurons by electrophysiological or ultrastructural techniques could 
provide the numerical description of synaptic strengths shown in 
Table 1. The results of these experiments would show that each 
individual principal neuron P, inhibits the other three principal 
neurons (P;). There is either monosynapuc inhibition from P; to P, 
or polysynaptic inhibition by an excitatory synapse from P, to an 
interneuron (IN,), which then forms an inhibitory synapse with P; 
(for example, the Pj-to-P2 pathway in Fig. 1). This synaptic 
organization provides an “effective” inhibitory synapse between any 
two principal neurons, an action potential elicited in one principal 
neuron always contributes to inhibition of each of the others. 
Similar experiments measuring the strengths of the synaptic connec- 
tions between the input axon Q and the P; would show effective 
excitatory connections (Table 1). While the organization between 
principal neurons could be described classically as “lateral inhibi- 
tion,” the output patterns of activity in the P;, shown in Fig. 2A for 
different input intensities, cannot be explained by this qualitative 
description. : 

Given the synaptic strengths in Table 1 and an appropriate 
mathematical description of the neurons, we can simulate the model 
neural circuit and produce the output activiy patterns for the 
different inputs. Such detailed simulations can also be done for real 
neural circuits if the required parameters are known. In general, an 
ability to correctly predict a complex result that relies solely on 
simulation of the system provides a test of the simulation model, but 
does not provide an understanding of the result. Thus, despite our 
classical analysis of the simple neural circuit in Fig. 1, we still have 
no understanding of why these particular synaptic strengths (Table 
1) provide these particular relations between input and output 
activity. Computation in the circuit shown in Fig. 1 can, however, 
be defined and understood within the conceptual framework pro- 
vided by an analysis of dynamics in the simple neural circuit model 
we now discuss. 


The Model Circuits and Their 
Relation to Biology 


Neurons are continuous, dynamical systems, and neuron models 
must be able to describe smooth, continuous quantities such as 

traded transmitter release and time-averaged pulse intensity. In 
McCulloch-Pitts models, neurons were logical decision elements 
described by a two-valued state variable (on-off) and received 
synaptic input from a small number of other neurons. In general, 
McCulloch-Pitts models do not capture two important aspects of 
biological neurons and circuits: analog processing and high inter- 
connectivity. While avoiding these limitations, we still want to 
model individual neurons simply. In the absence of appropriate 
simplitications, the complexities of the individual neurons will loom 
so large thar it will be impossible to see the effects of organized 
synaptic interactions. A simplified model must describe a neuron’s 
effective output, input, internal state, and the relation between Its 
input and output. 

In the face of the staggering diversity of neuronal properties, the 
goal of compressing their complicated characteristics 1s especially 
dificult. For the present, let us consider a prototypical biological 
neuron having inputs onto tts dendritic arborization from other 
neurons and outputs to other neurons from synapses on its axon. 
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Action potentials initiate near the soma and propagate along the 
axon, activating synapses, Although we could model the detailed 
synaptic, integrative, and spike-initiating biophysics of this neuron, 
following, for example, the ideas of Rall (7), the first simplification 
we make in our description of the neuron is to neglect electrical 
effects attributable to the shape of dendrites and axon. (The axon 
and dendrite space-constants are assumed to be very large.) Our 
model neuron has the capacitances and conductances of the arbori- 
zation added directly to those of the soma. The input currents from 
all synaptic channels are simply additive; more complex interactions 
between input currents are ignored. Membrane potential changes 
are assumed to arrive at the presynaptic side of synapses at the same 
time as they are initated at the soma. The second simplification is to 
deal only with “fase” synaptic events. When a potential fluctuation 
occurs in the presynaptic terminal of a chemical synapse, a change in 
the concentration of neurotransmitter is followed (with a slight 
delay) by a current in the postsynaptic cell. In our mode! neurons we 
presume this delay is much shorter than the membrane time 
constant of the neuron. 

These two suppositions on time scale mean that when a change in 
potential is initiated at the soma of cell J, it introduces an effectively 
instantaneous conductance change in a postsynaptic cell 1. The 
amount of the conductance change depends on the nature and 
strength of the synapse from cell j to cell i. 

Biological neurons that produce action potentials do so (in steady 
“sate) at a rate determined by the net synaptic input current. This 

atrent acts indirectly by charging the soma and changing the cell 
potential. A characteristic charging or discharging time constant is 
determined by the cell capacitance C and membrane resistance R. 
The input current is “integrated” by the cell RC time constant to 
determine a value of an effective “input-potential,” #. Conceptually, 
this potential » is the cell membrane potential after deletion of the 
action potentials. Action potentials (and postsynaptic responses in 
follower cells) are then generated at a rate dependent on the value of 
u, Dependencies of firing rates on input currents (and hence #j vary 
greatly, bur have a generally sigmoid and monotonic form (Fig. 
3A), rising continuously between zero and some maximum value 
(22). The firing rate of cell ¢ can be described by the function fi(x;). 
For processing in which individual action potentials are not syn- 
chronized or highly significant, a model that suppresses the details 
of action potentials should be adequate. In such a limiting case, two 
variables describe the state of neuron /: the effective input potential 
uw; and the output firing rate f(#,;). The strength of the synapuc 
current into a postsynaptic neuron / due to a presynaptic neuron 3 is 
proportional to the product of the presynaptic cell’s output [fi(#;)] 
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Fig. 2. (A) Results of an experiment in which the activity in each neuron in 
the circuit of Fig. 1 was simultaneously recorded (by simulation) as a 
function of the strength of the input stimuius on axon Q. The strength of the 
input stimulus is indicated by the numbers above each time epoch. (B) A 
selective rearrangement of the data in (A) illustrating the analog-binary 
computation being performed by the circuit. The digital word V;V,V\Vo is 
calculated from the records. 
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Table 1. Effcetive synaptic strengths for the circuit in Fig. | 
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and the strength of the synapse from # to j. In our model, the 
strength of this synapse is represented by the parameter Ty, so that 
the postsynaptic current is given by Ty f(«;). The net result of our 
description is that action potentials have their effects represented by 
continuous variables, just as the usual equations describing the 
behavior of electrical circuits replace discrete electrons by continu- 
ous charge and current variables. 

Many neurons, both central and peripheral, show a graded 
response and do not nommally produce action potentials (23). The 
presynaptic terminals of these graded-response neurons secrete 
neurotransmitters, and hence induce postsynaptic currents, at a rate 
dependent on the presynaptic cell potential. The effective output of 
such cells is also a monotonic sigmoid function of the net synaptic 
input. Thus the model treats both neurons with graded responses 
and those exhibiting action potentials with the same mathematics. 

We can now describe the dynamics of an interacting system of N 
neurons, The following set of coupled nonlinear differential equa- 
tions results from our simplifications and describes how the state 
variables of the neurons (#); 1 = 1,. . ., N) will change with time under 
the influence of synaptic currents from other neurons in the circuit. 


iy ’ 
Crge = 2Tas film) ~ +h @=h..N) QQ) 


These equations might be thought of as a description of “classical” 
neurodynamics (12, 14). They express the net input current charg- 
ing the input capacitance C, of neuron i to potential , as the sum of 
three sources: (i) postsynaptic currents induced in ¢ by presynaptic 
activity in neuron j, (ii) leakage current due to the finite input 
resistance R; of neuron i, and (iii) input currents J; from other 
neurons external to the circuit. The time evolution of any hypotheti- 
cal circuit, defined by specific values of Ty, Ii, fj, Ci, and Rj, can be 
simulated by numerical integration of these equations. 

Some intuitive feeling for how a mode! neural circuit might 
behave can be provided by considering the electrical circuit shown in 
Fig. 3B, which obeys the same differential equation (Eq. 1). The 
“neurons” consist of amplifiers in conjunction with feedback circuits 
composed of wires, resistors, and capacitors organized to represent 
axons, dendritic arborization, and synapses connecting the neurons. 
The firing rate function of our model neurons [f;(#;)] is replaced in 
the circuit by the output voltage V; of amplifier i. This output is 
V, = VP™* gi(u;), where the dimensionless function g;(#;) has the 
same sigmoid monotonic shape (Fig. 3A) as fi(#;) and a maximum 
value of 1. Vi is the electrical circuit equivalent of the maximum 
firing rate of cell &. The input impedance of our model neuron is 
represented in the circuit by an equivalent resistor p; and an input 
capacitor C; connected in parallel from the amplifier input to 
ground. These components define the time constants of the neurons 
and provide for the integrative analog summation of the synaptic 
input currents from other neurons in the network. To provide for 
both excitatory and inhibitory synaptic connections between neu- 
rons while using conventional electrical components, cach amplifier 
is given two outputs—a normal (+) outpur and an inverted (-) 
output of the same magnitude but opposite in sign. A synapse 
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between two neurons is defined by a conductance Ty, which 
connects one of the two outputs of amplifier f to the input of 
amplifier i. This connection is made with a resistor of value Ry = 
LT yl. If the synapse is excitatory (Ty > 0), this resistor is connected 
to the normal (+) output of amplifier j. For an inhibitory synapse 
(Ty < 0), it is connected to the inverted (—) output of amplifier j. 
Thus, the normal and inverted outputs for each neuron allow for the 
construction of both excitatory and inhibitory connections through 
the use of normal (positive valued) resistors. The circuits include a 
wire providing an externally supplied input current J; for each 
neuron (Fig. 3B). These inputs can be used to set the general level of 
excitability of the network through constant biases, which effectively 
shift che input-output relation along the #; axis, or to provide direct 
parallel inputs to drive specific neurons. As in Eq. 1, the net input 
current to any neuron is the sum of the synaptic currents (flowing 





fmax 


fu) 


Firing re 











ie} 
Input potential, u 








IN, IN2 INg 
Ta 









































V4 Vo V3 Va 


Fig. 3. (A) The sigmoid monotonic input-output relation used for the model 
neurons. (B) The model neural circuit in electrical components. The output 
of any neuron can potentially be connected to the input of any other neuron. 
Black squares at intersections represent resistive connections (with conduc- 
tance T,,) between outputs and inputs. Connections between inverted 
outputs (represented by the cireles on the amplifiers) and inputs represent 
negative (inhibitory) connections. 


628 





through the set of resistors connecting its input to the outputs of the 
other neurons), externally provided currents, and leakage current. 

In the model represented by Eq. 1 and Fig. 3, the properties of 
individual model neurons have been oversimplified, in comparison 
with biological neurons, to obtain a simple system and set of 
equations. However, essential features that have been retained 
include the idea of a neuron as transducer of input to output, with a 
smooth sigmoid response up to a maximum level of output; the 
integrative behavior of the cell membrane; large numbers of excit- 
atory and inhibitory connections; the reentrant or fedback nature of 
the connections: and the ability to work with both graded-response 
neurons and neurons that produce action potentials. None of these 
features was the result of approximations. Their inclusion in a 
simplified model emphasizes features of the biological system we 
believe important for computation. The model retains the two 
important aspects for computation: dynamics and nonlinearity. 

The model of Eq. 1 and Fig, 3 has immense computing power, 
achieved through organized synaptic interactions berwecn the neu- 
rons. The model neurons lack many complex features that give 
biological neurons, taken individually, greater computational capa- 
bilities. It seems an appropriate model for the study of how the 
cooperative effects of neuronal interactions can achieve computa- 
tional power. 


A New Concept for Understanding the 
Dynamics of Neural Circuitry 


A specific circuit of the general form described by Eq. 1 and Fig. 3 
is defined by the values of the synapses (Ty) and input currents (Ii). 
Given this architecture, the state of the system of neurons is defined 
by the values of the outputs V; (or, equivalently, the inputs #;) of 
each neuron. The circuit computes by changing this state with time. 
In a geometric space with a Cartesian axis for each neural output'V;, 
the instantaneous state of the system is represented by a point. A 
given circuit has dynamics that can be pictured as a time history or 
motion in this state space. For a circuit having arbitrarily chosen 
values for the synaptic connections, these motions can be very 
complex, and no simplifying description has been found. A broad 
class of simplified circuits, however, has a unifying principle of 
behavior while remaining capable of powerful computation. These 
circuits are literally or effectively symmetric. 

A symmetric circuit is defined as having synaptic strength and 
sign (excitation or inhibition) of the connection from neuron i to j 
the same as from j to i. The two neurons need not, however, have 
the same input-output relation, threshold, or capacitance. Our 
model circuit (Fig. 3B) is symmetric if, for all i andj, Tj is equal to 
Ty. This symmetry refers only to connections between neurons in 
the circuit. It specifically excludes the input connections (represent- 
ed in Fig. 3B as the input currents J)) and any output connections 
from the circuit. 

Symmetry of the connections results in a powerful theorem about 
the behavior of the system. The only additional conditions necessary 
are that the input-output relation of the model neurons be mono- 
tonic and bounded and that the external inputs J; (if any) should 
change only slowly over the time of the computation. The theorem 
show's that a mathematical quantity E, which might be thought of as 
the “computational energy,” decreases during the change in neural 
state with time described by Eq. 1. Started in any initial state, the 
system will move in a generally “downhill” direction of the £ 
function, reach a state in which E is a local minimum, and stop 
changing with time. The system cannot ‘oscillate. This concept can 
be illustrated graphically by a flow map in a state-space diagram. 
Each line corresponds to a possible cime-history of the system, with 
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arrows showing the direction of motion. The structure imposed on 
the flow map for a circuit with symmetry is illustrated for a two- 
dimensional state space in Fig. 4. With symmetric connections, the 
flow map of the neural dynamics resembles Fig. 4B. Such a flow, in 
which each trajectory goes to stable points and stops, results from 
always going “downhill” on an “energy-terrain,” coming to the 
bottom of a local valley, and stopping. The contour map of an E 
function that matches the flow in Fig. 4B is shown in Fig. 4A; it 
shows separated hills and valleys. The valleys are located where the 
trajectories in Fig. 4B stop. For a nonsymmetric circuit, the 

omplications illustrated in the flow map in Fig. 4C can occur. This 
alow map has trajectories corresponding to complicated oscillatory 
behaviors. Such trajectories are undoubtedly important in neural 
computations, but as yet we lack the mathematical tools to manipu- 
late and understand them at a computational level. The motion of a 
neural circuit comprising N neurons must be pictured in a space of 
N dimensions rather than the two dimensions of Fig. 4, but the 
qualitative picture of the effects of symmetric synaptic strengths is 
exactly the same. 

The computational energy is a global quantity not felt by an 
individual neuron. The states of individual neurons simply obey the 
neural equations of motion (Eq. 1). The computational energy is 
our way of understanding why the system behaves as it does. A 
similar situation occurs in the concept of entropy in a simple gas. We 
understand that when a nonequilibrium state is set up with all the air 
molecules in one corner of the room, a uniform distribution will 
rapidly result. We explain that fact by the tendency of the entropy of 
isolated systems to increase whenever possible, but the individual 
molecules know nothing of entropy. They simply follow their 
Newtonian equations of motion. 

Symmetric chemical synapses are observed in neural systems (24). 

onrectifying electrical synapses are intrinsically symmetric synap- 
ses of positive sign (25). Lateral inhibition in the visual system of 
Limulus is implemented with symmetric inhibitory synapses (6). An 
asymmetric network can also behave as though it were symmetric. In 
the olfactory bulb, the local circuit of mitral cell to granule cell to 
mitral cell provides an equivalent symmetric inhibitory connection 
between the pair of mitral cells (26). A similar siruation occurs in the 
circuit shown in Fig. 1, where a direct equivalence between a neural 
circuit which is manifestly not symmetric and one which is effective- 
ly symmetric can be made if the inhibitory interneurons (IN, IN2) 
are faster than other neurons. 

The requirement of symmetry for this theorem can also be 
weakened. We have proven stability for a wide class of circuits 
having organized asymmerry berween wo sets of neurons with 
different time constants (16). (A neurobiological example would be 
the existence, in mammalian systems, of fast inhibitory interneurons 
that could provide effective symmetric inhibitory connections be- 
ween neurons that are otherwise excitatory.) In one potentially 
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useful example (16), stability could be guaranteed even though the 
sign of Tj was always opposite that of T,;. Also, there is a family of 
transformations by which a broader class of synaptic organizations 
can be made equivalent to symmetric ones (27). From an empirical 
viewpoint, moderate disorganized asymmetry (for example, having 
a random set of connections missing in an otherwise symmetric 
associative memory circuit) has little experimental effect on dynamic 
stability (28). Because the general features of symmetric circuits 
persist in circuits that are only equivalently symmetric, and real 
neural circuits can often be so viewed (except for inputs and 
outputs), the behavior of symmetric circuit models should be of 
direct use in trying to understand how neural computation is done 
in biology. 

In general, systems having organized asymmetry can exhibit 
oscillation and chaos (29). In some neural systems like central 
pattern generators (8), coordinated oscillation is the desired compu- 
tation of the circuit. Processing in the olfactory bulb also seems to 
make explicit use of oscillatory patterns (30). In such a case, proper 
combinations of symmetric synapses can enforce chosen phase 
relationships berween different oscillators, an effect similar to those 
presented above. : 


Hard Problems Naturally Solved by Model 
Neural Circuits 


In thinking about how difficult computational problems can be 
done on such networks, it is useful to recall the simple problem of 
associative memory, which these networks implement in a “natural” 
fashion (10, 13). This naturalness has two aspects. (i) The symmetry 
of the networks is natural because, in simple associations, if A is 
associated with B, B is symmetrically associated with A. (ii) If the 
desired memories can be made the stable states of a network, the 
desired computation (given partial information as input, find the 
memory that most resembles it) can be directly visualized as a 
motion toward the nearest stable state whose position is the recalled 
memory. Finally, the way the connection strengths must be chosen 
for a given set of memories can be easily implemented by learning 
rules (13) such as the one proposed by Hebb (32). 

To what extent can more difficult computations—for example, 
those relevant to object recognition or speech perception—be 
carried out naturally on these model! neural circuits? One of the 
characteristics of such computations seems to be a combinatorial 
explosion—the huge number of possible answers that must be 
considered. The desired computation (for example, matching a set 
of words to.a sound pattern) can often be stated as an optimization. 
Although it is not yet known how to map most biological problems 
onto model circuits, it is now possible to design model circuits to 
solve nonbiological problems having combinatorial complexity. 
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Because well-defined problems have been used, the effectiveness of 
the neural circuit computation can be quantified. We will review 
two circuit examples. 

The idea of most algorithms or procedures for optimization is to 
move in a space of possible configurations representing solutions, 
progress in a direction that tends to decrease the cost function being 
minimized, and hope that the space and method of moving are 
smooth enough that a good solution will ultimately be reached. 
Such ideas lic behind conventional computer optimization algo- 
rithms and the recent work in simulated annealing (32) and Bayesian 
image analysis (33). In our approach (15-17), the optimization 
problem is mapped onto a neural network in such a way that the 
network configurations correspond to possible solutions to the 
problem. An £ function appropriate to the problem is then con- 
structed. The form of the E function is chosen so that at configura- 
tions representing possible solutions, E is proportional to the cost 
function of the problem. Since, in general, E is minimized as the 
circuit computes, the dynamics produce a path through the space 
that tends to minimize the energy and therefore the cost function. 
Eventually, a stable-state configuration is reached that corresponds 
to a local minimum in the E function. The solution to the problem is 
then decoded from this configuration. 

It is particularly easy to construct appropriate E functions when 

sigmoid input-output relation is steep, because in this “high- 
gain” limit, each neuron will be either very near 0 output or very 
near its maximal output when the system is in a low E stable state 
(14). In the high-gain case, the energy function is 


(2) 


When lower gain is considered, terms containing the function gi(#;)) 
must be included in E (14). The following two examples make use of 
this high-gain limit. 

The simple seven-neuron circuit described in Fig. 1 was designed 
according to this conceprual framework to be a four-bit analog-to- 
binary (A-B) converter. Given an analog input to the circuit 
represented by the time-averaged impulse activity in the input axon 
Q, the neural circuit is organized to adjust the firing rates in the 
principal neurons so that they can be interpreted as the binary 
number numerically equal to the time-averaged input activity. 
Reorganization of the data in Fig. 2A will illustrate this computa- 
tion. In each time epoch in Fig. 2A, assign the value 0 or 1 to the 
variable V; representing the output of P;, if P; is firing strongly, 

= 1; if it is quiescent, V; = 0. Represent the activity in axon Q by 
« continuous variable X. The value of the binary word interpreted 
from the ordered list of numbers (V3V2V\Vo) is plotted in Fig. 2B 
for each of the different values of input strength X. The data points 
(asterisks) lie on a staircase function (dotted line) characteristic of an 
A-B converter. (Although not shown, the outputs computed for any 
other input would also lie on this curve.) 

Through the consideration of a specific energy function in the 
high-gain limit and the synaptic strengths and inputs listed in Table 
1, the behavior of the neural circuit can be predicted and under- 
stood. We decide in advance that outputs V3V2V|Vo of P; through 
Po are interpreted as a computed binary word. The problem to be 
solved is stated as an optimization: Given analog input X, which 
binary word (set of outputs) best represents the numerical value of 
X? The solution is provided when the following £ is minimized 
(16): 
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The second term in £ is minimized (and numerically equal to 0) 
when all V; are either close to 0 or close to 1. Since £ is minimized as 
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the circuit converges, stable states having the correct “syntax” tend 
to develop. Since the first term in E is a minimum when the 
expression in the parentheses vanishes, this term biases the circuit 
towards the states closest, in the least-squares sense, to the analog 
value of X, The E in Eq. 3 is like that in Eq. 2, a quadratic in the Vj. 
Rearranging Eq. 3 and comparing it with this general form yields 
values for Ty and J; for a circuit of the form in Fig. 3B that can be 
deduced within a common scale factor as 


Ty = ~ ea, I; = (-22-) pe 2'X) (4) 


The coefficient of X in J; is the synaptic strength from the input aron 
Q to the principal neurons. These specific values are equal to the 
strengths of the “effective” synapses tabulated in Table 1. Know!- 
edge that E is minimized as the circuit computes provides an 
understanding of how this synaptic organization both enforces the 
necessary syntax and biases the network to choose the optimum 
solution. 

Our second example is a neural circuit that computes solutions to 
the traveling salesman problem (TSP) (15). In this frequendy 
studied optimization problem (34), a salesman is required to visit in 
some sequence each of » cities; the problem is to determine the 
shortest closed tour in which every city is visited only once. Specific 
problems are defined by the distances (dy) between pairs of cities (1, 
J). Assigning letters to the cities in a TSP permits a solution to be 
specified by an ordered list of letters. For example, the list CAFGB is 
interpreted as “visit C, then A, then F, then G, then B, and finally 
return to C.” For an n-city TSP, this list can be decoded from the 
ourputs of N = n? neurons if we let a single neuron correspond to 
the hypothesis that one of the » cities is in a particular one of the » 
possible positions in the final tour list. This rule suggests the 
arrangement illustrated in Fig. 5 for displaying the neural output 
states. The output of a neuron (V; ) is graphically illustrated by 
shading; a filled square represents a neuron which is “on” and firing 
strongly, An empty square represents a neuron that is not firing. The 
output states of the » neurons in cach row are interpreted as 
information about the location of a particular city in the tour 
solution. The output states of the » neurons in each column are 
interpreted as information about what cities are to be associated 
with a particular position in the tour. If the neuron from column 5 
in row C is “on,” the hypothesis that city C is in position 5 in the 
final tour solution is true. 

Hypothetically, each of the # cities could indicate its position in 
any one of the » possible tour locations. Therefore, 2% possible 
“neural states” could conceivably be represented by these outputs. 
However, only a subset of these actually correspond to valid 
solutions to the TSP (valid tours): a city must be in one, and only 
one, position in a valid tour, and any position must be filled by one 
and only one city. This constraint implies that only output states in 
which only one neuron is “on” in every row and in every column are 
of the correct “syntax” to represent valid solutions to the TSP. A 
TSP circuit that is to operate correctly must have synapses favoring 
this subset of states. Simple lateral inhibition berween neurons 
within each row and column will provide this bias. For example, if 
Vaz (representing city B in position 2) is “on,” all other neurons in 
row B and column 2 should be inhibited. This can be provided by 
the inhibitory connections from neuron Vg drawn in Fig. 5 (red 
lines). Similar row and column inhibitory connections are drawn for 
neuron Vps. A complex “topology” of svntax-enforcing connec- 
tions is generated. We can also think of these connections as 
contributing a term to the E function for che circuit. For example, a 
term +A Vy, Vy, in E makes a contribution —A to the synaptic 
strength Ty:y; and represents a mutual lateral inhibition berween 
neurons (Xi) and (¥,t). The term is positive (higher £) when both 
of these neurons are “on,” but contributes nothing if only one of the 
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nwo is “on.” The proper combination of similar terms in an E 
function can specify the synapses that coordinate correct syntax. 

Ina syntactically correct state representing a valid solution (tour), 
if neurons Vy; and Vyj4) are both “on,” the salesman travels from 
city X directly to city Y. Therefore, the distance dy y between these 
nwo cities is included in the total tour length for chat solution. A 
term of the form +dy.y Vy Vier in the E function provides a 
“distance” contribution of d,, to the value of E when these neurons 
are “on.” Similar terms, properly summed, will add to E a value 
equal to the length of the tour. Since the circuit minimizes £, the 
final state will be biased toward those valid solutions representing 
short tours. Such inhibitory connections are drawn in Fig. 5 with 
blue lines for neurons Vg» and Vp.5..In TSP and in the earlier 
example, the rules of syntax are expressed in inhibitory connections, 
It seems easier to define what these systems should not do (by 
inhibitory connections), and to define what they should do by 
default, rather than to define what they should do by writing syntax 
in excitatory connections. 

The inhibitory synapses define the computational connections for 
the TSP circuit. With a common sigmoid gain curve, R, and C for 
cach neuron, the description of the circuit is complete. The gain 
curve is chosen so that with zero input, a neuron has a nonzero bur 
modest output. This circuit can rapidly compute good solutions to a 
TSP problem (15). When started from an initial “noise” state 
“woring no particular tour, the network rapidly converges to a 
oteady state describing a very short tour. The state of the circuit at 
several time points in a typical convergence is illustrated in Fig. 6. In 
a 30-city problem, there are about 10°° possible tours—-the combi- 
natorial problem has gorten completely out of hand. But the circuit 
of 900 neurons can find one of the best 107 solutions in a single 
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Fig. 5. A stylized picture of the syntax and connections of the TSP neural 
circuit. Each neuron is symbolically indicated by a square. The neurons are 
arranged in an » by » array. Each city is associated with » neurons in a row, 
and cach position in the final tour is associated with » neurons in a column. 
A given neuron (Vy. ;) represents the hypothesis that city X is in position j in 
the solution. The pattems of synaptic connection for two different neurons 
are indicated. 
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convergence—a few time constants of the circuit. In selects good 
answers and rejects bad ones by a factor of 10°. 

The continuous response characteristic of the analog neurons in 
the TSP circuit represents partial knowledge or belief. A value for 
Vy, ; berween 0 and 1 represents the “strength” of the hypothesis 
that city X is in position 7 of the tour. During an analog conver- 
gence, several conflicting solutions or propositions can be simulta- 
neously considered through the continuous variables. It is. as though 
the logical operations of a calculation could be given continuous 
values benween “truce” and “false” and evolve toward certainty only 
near the end of the calculation. This is evident during the TSP 
convergence process (Fig. 6) and is important for finding good 
solutions to this problem (15). If the gain is greatly increased, the 
output of any given neuron will usually be either 1 or 0, and the 
potential analog character of the network will not be utilized. When 
operated in this mode, the paths found are little better than random. 
The analog nature of the neural response is in this problem essential 
to its computational effectiveness. This use of a continuous variable 
between true and false is similar to the theory of fuzzy sets (35) and 
to the use of evidence voting for the truth of competing proposi- 
tions in Bayesian inference and connectionist modeling in cognitive 
psychology (36). Two-state neurons do not capture this computa- 
tional feature. 


Discussion 


The work reviewed here has shown that a simple model of 
nonlinear neurons organized into networks with effectively symmet- 
ric connections has a “natural” capacity for solving optimization 
problems. The general behavior can be readily adapted to particular 
problems by appropriately selecting the synaptic connections. Opti- 
mization problems are ubiquitous where goals are attempted in the 
presence of conflicting constraints, and thev arise in problems of 
perception (What three-dimensional shape “best” describes a given 
shading pattem in a two-dimensional image?), behavioral choice, 
and motor control (What is the optimum trajectory to move an 
appendage to minimize internal stresses?). Hence circuits consistent 
with this model could efficiently solve problems important in 
biological information processing. 

Biologically relevant problems in vision have already been formu- 
lated in terms of optimization problems. Edge-detection, stereopsis, 
and motion detection can be described as “ill-pased” problems, and 
solutions can be found by minimizing appropriate quadratic func- 
tionals (37). The emphasis in these formulations has been simple 
convex problems with a single minimum in the energy. Networks 
solving these problems can be implemented by linear circuits having 
local connections. The nonlinear circuits described here can imple- 
ment solutions to much more complex problems and have recently 
been used to solve the object-discontinuity problem in early vision 
(8). 

The concept of an energy function and its use in circuit design 
provide an understanding of how model neural circuits rapidly 
compute solutions to optimization problems. The state of each 
neuron changes in time in a simple way determined by the state of 
neurons to which it is connected, but the organization of the 
synapses results in collective dynamics that minimize an E function 
relevant to the optimization problem. Knowledge of this E function 
helps us understand the collective dynamics. The two circuit exam- 
ples reviewed here, the A-B converter and TSP circuit, were 
“forward-engineered.” Given the opumization problem, a represen- 
tation of hypothetical solutions to the problem as a particular set of 
neural states was constructed. Synaptic connections in the operating 
circuit move the neural state toward these solution states and bias 
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Fig. 6. The convergence of a ten-city analog circuit to a tour. The linear 
dimension of each square is proportional to the value of Vy; (A to C) 
Intermediate times. (D) The final state. Indices illustrate how che final stare 
is decoded into a tour (solution of the TSP). 


this motion toward the best solution, The values of these synaptic 
strengths were summarized in the single algebraic statement of the E 
function. [The two problems illustrate different ways in which 
“data” modulate the circuit parameters: as input currents in the A-B 
converter or as changes in the connection strengths in the TSP 
circuit (17).] Forward-engineered examples of model neural circuits 
add to the known repertoire of computational circuits that seem 
neurobiologically plausible. The general problem of neurobiology is 
“reverse engineering”—-to understand the operation of a complex 
biological circuit with unknown design principles and internal 
representations. In general, the set of neural circuits whose function- 
ing is understood provides an information base for hypothesizing 
* ‘ction in biological neural circuits in the same way that the study 
cu. understood electrical circuits aids the attempt to understand or 
reverse engineer an unfamiliar electrical circuit diagram. 

When a problem falls naturally onto a neural circuit, its conver- 
gence to a collective analog decision in a few time constants 
represents immense computation for the amount of hardware 
involved. For example, the 30-city TSP can be done ona network of 
900 neurons, When that kind of combinatorial problem occurs in 
perception and pattern recognition, the input to the system will 
occur in parallel and take little time. A biological neural network of 
this structure would converge to an answer in a few neural time 
constants, thus in about 0.1 second. An electronic circuit of the same 
structure would converge in about 1 psec. A comparably good 
solution to this problem, with conventional algorithms used for the 
TSP, can be found in about 0.1 second on a typical microcomputer 
having 10* times as many devices. The effectiveness of the neural 
system on the basis of computations per device per time constant ts 
great in comparison with the usual general-purpose digital machine. 
The ability of the model networks to compute effectively is based on 
large connectivity, analog response, and reciprocal or reentrant 
connections. The computations are qualitatively different from those 
performed by Boolean logic. 


oR 





Other specific circuit designs have been studied. Many problems 
in signal processing can be described as the attempt to detect the 
presence or absence of a waveform having a known stereotyped 
shape in the presence of other waveforms and noise. (The recogni- 
tion of phonemes in a stream of speech is conceptually similar, bur 
fraught with large problems of variability from the stereotype form.) 
We have described the general organization of neural circuits that 
could solve this task (16). Energy functions have been described for 
other combinatorial optimization problems, including graph color- 
ing (17), the Euclidean-match problem (17), and the transposition 
code problem (15). Circuits that relax the restriction on a symmetric 
connection matrix (as biology does) have also been studied. A 
circuit designed to provide solutions to lincar programming prob- 
lems (16) functions without oscillation when the characteristic times 
of these elements are properly specified, even though its computing 
elements have antisymmetric connection strengths. The associative 
memory originally discussed (10) and used in a model of learning in 
a simple invertebrate (38) can be described as an optimization 
problem (15). The same conceptual framework can seemingly be 
applied to a large number of different problems. 

Because the basic idea of the model neural circuit can be expressed 
as an electrical circuit, there have been efforts to build such 
hardware. Associative memories of 32 neurons (amplifiers) have 
been built in conventional electrical circuit technology (39). A 22- 
neuron circuit has been successfully microfabricated on a single 
silicon chip (40). Shrinking this kind of network to a compact size 
seems possible (41). The most compact and useful form of such a 
device would involve an electrically writable resistance change in a 
two-terminal device, which would function approximately as a 
Hebbian (31) synapse. Examples of such material fabrications exist 
(42). A 32-neuron system has been fabricated that uses optics to 
implement connections (43). Technological questions have so far 
focused chiefly on associative memory. Similar circuits could be used 
to solve problems in signal detection and analysis, such as artificial 
visual systems, in which there tends to be immense data overload 
and where concurrent distributed processing is desired. 

In both biological neural systems and man-made computing 
structures, hierarchy and rhythmic or timed behaviors are impor- 
tant. The addition of rhythms, adaptation, and dming provides a 
mechanism for moving from one aspect of a computation to another 
and for dealing with time-dependent inputs and will lead to new 
computational abilities even in small networks. Hierarchy is neces- 
sary to keep the number of synaptic connections to a reasonable 
level. To extend the present ideas from neural circuit to neural 
system, such notions will be essential. 
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Arctic Research in the National Interest 


A. L. WASHBURN AND GUNTER WELLER : 





The Arctic Research and Policy Act of 1984 was designed 
to advance arctic research in the national interest. Some of 
the research fields that require attention are weather and 
climate; national defense; renewable and nonrenewable 
esources; transportation; communications and space- 
asturbance effects; environmental protection; health, 
culture, and socioeconomics; and international cooper- 
ation. A research framework recommended by the Arctic 
Research Commission includes, in order of priority, inte- 
grated investigations to understand: (i) the Arctic Ocean 
(including the marginal seas, sea ice, and seabed) and how 
the ocean and atmosphere operate as coupled components 
of the arctic system; (ii) the coupled atmosphere and land 
components and how their interaction governs the terres- 
trial environment; and (iii) the high-latitude upper atmo- 
sphere and its extension into the magnetosphere with 
emphasis on predicting and mitigating effects on commu- 
nications and defense systems. A separate recommenda- 
tion is for high priority research to resolve the major 
health, behavioral, and cultural problems related to the 
arctic environment. Recommendations are also made 
concerning support services and management. 
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HE ARCTIC IS IMPORTANT FOR MANY REASONS—DEFENSE, 
economic, political, and scientific (14). The Arctic Research 
and Policy Act of 1984 has now put some of these interests 
into sharper focus. Its stated purposes are “to establish national 
policy, priorities, and goals and to provide a Federal program plan 
for basic and applied scientific research with respect to the Arctic, 
including natural resources and materials, physical, biological and 
health sciences, and social and behavioral sciences” [5, Section 
102(b)(1)}. The act established two cooperating groups to carry out 
its intent: (i) an advisory Arctic Research Commission consisting of 
five presidential appointees and the director of the National Science 
Foundation, who serves as an ex officio, nonvoting member, and (ii) 
an executive Interagency Arctic Research Policy Committee, con- 
sisting of a representative from ten named federal agencies and 
possibly others, which is chaired by the National Science Founda- 
tion representative. 
Passage of the act reflected an increasing awareness in Alaska, in 
Washington, and among scientists and others thar U.S. arcnc 





A. L. Washbum is a professor emeritus in che Department of Geological Sciences and 
the Quaternary Research Center, University of Washington, Scartic, WA 98 195, and is 
a member of the U.S. Arctic Research Commission. G. Weller is a professor in the 
Geophysical Institute, University of Alaska, Fairbanks, AK 99701, and is chairman of 
the Polar Research Board of the National Research Council. 
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A la fois mode de stockage et 
organe de traitement de lin- 
formation, les mémoires asso- 
ciatives ressemblent, par leur 
aptitude 4 |’ apprentissage, 
feur tolérance aux défauts et le 
parallélisme inhérent a leur 
fonctionnement, au cerveau 
biologique. Objets de curiosité 
il y a quelques années, ces 
nouvelles mémoires commen- 
cent a donner la preuve de 
leur efficacité dans la résolu- 
tion de nombreux problémes : 
Classification, reconnaissance 
de formes, traitement dima- 
ges... ttés difficiles et codteux 
a résoudre pour les ordina- 
teurs classiques. 
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‘ai effacé tous les souvenirs 

de Hal a partir du moment 

exact ou les problémes ont 

commence. 

- Comment avez-vous 

fait ? (...) Vous n'avez pas 
pu effectuer un simple effacement 
chronologique. !! vous a fallu un 
ver solitaire informatique, s'atta- 
quant a certains mots et certains 
concepts. 
- (...) est possible de concevoir 
un programme qu 
un systéme pour traquer et dé- 
truire des informations spécifi 
ques. » Eg + Be eS 
Voila ‘comment Arthur “Clarke 
décrit, dans 2001, Odyssée deux, 
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“Allemagne, en 
*'méme temps que i] 
_ aux Etats-Unis, a la conception 
des calculateurs programmables, 
a donné fa définition suivante « 
«La mémoire associative est une 
~ mémoire dans laquelle on a accés 
_au contenu de la mémoire, non a 


‘on injecte dans . 
“Mais, ‘poursuivait K. Zuse, "« ce 
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~~. “‘utopiques. Depuis longtemps, e! 


J. von Neumann, 


»mal orthographiées, systémes ex- 


une adresse. Nous savons que le 
cerveau humain fonctionne sur ce 
principe... » : 
Cette idée que la mémoire est 
faite d'associations entre 
concepts est tres ancienne, puis- 
qu'on en trouve déja trace 4 l’épo- 
que d'Aristote. Elle a fait son ap- 
parition en informatique grace aux 
recherches effectuées sur l'utilisa- 
tion d'associations simples pour 
représenter la signification des 
mots dans les bases de données. _ 


pas vers la mémoire associative 


~ ‘n'a pas été franchi par l'industrie 
informatique. La technologie ‘n’a 


~ pas encore atteint ce stade. » ‘Bo 
.. Entretemps, les recherches sur 
ces meémoires’ ont considérable-_ 
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les applications concrétes qui, si 
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connaissance de formes,.Jecture - 
de caractéres manuscrits, tri et 
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Fig. 1. - L ensemble des évolutions libres d'un réseay peut 6tre représenté par un graphe dont les neeuds sont les élats et ont chaque arc représente la transition d'un 
état & un autre. Considérans je cas d'un réseau binaire 4 cing neurones, comportant donc 2 = 32 états possibles, numérotés de 0 4 31 Dans notre Cas de figure, le 
réseau posséde un état stable attracteur, état 30 (a), un cycle attracteur, 25-10-1-27.25 (0), un état stable isolé, 9 (G), et un cycle isolé, 2-8-19.22.15.2 (Q). 
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faculté de mémorisation. La mé- 
Moire biologique, ceile de 
Vhomme en Particulier, conserve 
des états de conscience passés 
et les éléments qui y sont ratta- 
chés, et ceux-ci peuvent étre rap- 
pelés, volontairement ou non; la 
fonction de Stockage de !'informa- 
tion y est intrinséquement lige 
avec celle de traitement de ladite 
information, 

Dans les ordinateurs, les par- 
ties qui remplissent les fonctions 
de stockage et de tappel d'infor- 
mations portent aussi le nom de 
mémoires, bien qu'elles accom. 
plissent ces taches d'une tout 
autre fagon. Dans les meémoires 
classiques d'ordinateurs, qu'il 
S'agisse de mémoires vives 
(RAM), mortes (ROM) ou de ma- 
moires de masse (bandes ou dis- 
ques magnétiques, disques opti- 
ques...), les informations sont 
organisées de maniére séquen- 


ble dans le cerveau, si bien que la 
destruction d'un grand nombre de 
neurones, qu'elle soit naturelle et 
Progressive au cours du vieillisse- 
ment, ou brutale lors d'un acci- 
dent, n'a pas pour effet d'effacer 
certains souvenirs Particuliers, 
mais plutét d'altérer globalement 
toute la fonction de memorisation. 

Les données n‘étant pas locali- 
sées dans une mamoire associa- 
tive, la notion d'adresse n’existe 
pas. Comment, dans ces condi- 
tions, aceéder aux informations 
memorisées ? Les seules entrées 
possibles sont également des 
Ccontenus de mémoire. Aussi les 
Mémoires associatives sont-elles 
des mémoires adressables par 
contenu ou CAM (Content Adres- 
sable Memory, en anglais). L'es- 
sentiel est que la donnée entrée 
soit associée A celle que l'on veut 
récupérer en sortie. 


tel stimulus sensoriet tel type de 
comportement, ainsi qu’'a trouver 
des solutions a certains Problée- 
mes par l'intuition, ou encore en 
raisonnant par analogie avec des 
situations semblables connues. 
Habituellement, ce dernier mode 
de raisonnement est le plus utilisé 
et aussi le plus efficace : 6 effet, 
nous observons que, dans la réa- 
lita, des causes similaires engen- 
drent des effets similaires. Mais il 
n'est plus du tout approprié lors- 
Que nous avons a faire a des ope: 
rations algébriques ou ala logique 
formeile. C'est Pourquoi f'ordina- 
teur surpasse l'homme dans ces 
domaines, alors que, pour l'obser- 
vation, |'interprétation et la 
compréhension, le cerveau est 
bien supérieur ala machine, 

Le fonctionnement du cerveau 
a été étudié bien avant l'appari- 
tion de ce type de Mémoires, et 
c'est probablement le modéle bio- 


tielle, sous la forme de séries de an logique qui a inspiré leur dévelop- 
bits (0 ou 1). Pour faciliter leur ac- Associations Semen aes ngine donnés a Mee 
ces. C&UX-ci sont généralement didées taines structures de mémoires 


groupes en mots de 8 ou 16 bits, 
Parfois plus, rangés en ordre se- 


“quentiel dans des cases numéro- 


tees par une d adresse ». Ainsi, 
chaque information est accessi- 
ble, de maniére univoque, par 
cette adresse, 

Dans une mémoire associative, 
les informations ne peuvent pas 
étre localisées 4 des emplace- 
ments déterminés, mais chaque 
donnée mémorisée est distribuée 
sur l'ensemble de la structure qui 
constitue la mémoire. Nous recon- 
natssons la une des caractéristi- 
ques de I'holographie (cf. Micro- 
Systémes n° 72 page 79). Elle 
ressemble aussi en cela 4 la mé. 
moire biologique qui, si l'on en 
croit les conclusions des neuro- 
physiologistes, n'est Pas localisa- 
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Cette association peut étre 
formulée mathématiquement par: 
y = Ax, exprimant que A est une 
transformation faisant correspon- 
dre a l'entrée x la forme y en sor- 
tie. 

Dans une mémoire classique, x 
serait l'adresse, y la donnée 
contenue en x, alors que dans 
une memoire associative, x et y 
sont tous deux des contenus de 
mémoire : x peut étre un mot clé 
associé a l'information y cher- 
chée, ou une Partie de y, ou en- 
core une autre information asso- 
ciée a y dans la mémoire. 

Cette propriété d'association a 
été invoquée pour expliquer cer- 
taines hypothéses concernant 
Vaptitude du cerveau 4 associer a 


associatives : « neurones », « SYy- 
Napses », « potentiel synaptique », 
etc., temoignent d'ailleurs de leur 
parenté avec les structures céra- 
brales (Micro-Systémes n° 61 
page 80). 

A instar du cerveau, les mé- 
Moires associatives sont des 
structures dynamiques, dont 
lévolution peut atre représentée 
Par un graphe: les noeuds sont 
les états, ou configurations, et les 
arcs représentent les transitions 
entre ces etats (fig. 1). 

On distingue les mémoires 
auto-associatives, qui associent a 
une forme le modéle meéemorisé le 
plus proche, et les mémoires hé- 
téro-associatives Qui stockent 
une relation entre deux ou plu- 


Mars 1987 


eo re Ni 


re 
i 





sieurs informations, de sorte que, 
si la forme entrée est proche de la 
premiére donnée, la mémoire four- 
nit en sortie la seconde forme du 
couple. 

Les mémoires auto-associati- 
ves permettent de retrouver une 
information a partir d'une partie 
de celle-ci, d'en extraire le bruit, 
que ce dernier soit aleatoire — 
c'est le cas des images en geéné- 
tal — ou non — comme dans les ca- 
ractéres déformés de lécriture 
manuscrite. 

Les mémoires hétéro-associati- 
ves trouvent une application trés 
intéressante dans les systémes 
experts: les «régles de produc- 
tion», exprimées sous la forme 
« si condition alors conclusion » (si 
x alors y), sont stockées dans fa 


meémoire. Le moteur d'inférence - 


confronte !'ensemble de ces ré- 
gles a l'ensemble des faits consti- 
tuant la « base de faits ». Les faits 
x sont entrés en mémoire en 
phase d'utilisation et la mémoire 
associative donne, en sortie, la 
conclusion y correspondante, 

Ce mode d’inférence — l'un des 
plus utilises dans les systémes 
experts — est fondé sur la régle lo- 
gique mathematique que l'on ap- 
pelle modus ponens, selon la- 
quelle, si A implique B et si A est 
vrai, alors B est vrai. 


Des mémoires 
floues 


Bart Kosko, de l'Université de 
Californie (Irvine), a mis en évi- 
dence |'importance de la logique 
floue (Micro-Systémes n° 64 
page 92) dans les mémoires asso- 
ciatives. Dans le modus ponens 
classique AB, lorsqu'une entrée 
ou clé de recherche A’ est présen- 
tée a la mémoire, la donnée B est 
rappelée si et seulement si A’=A. 
Or nous avons vu que les mémoi- 
res associatives peuvent identifier 





Fig. 2. - Une mémoire associative floue peut étre re- 
présentée par un graphe FCM (Fuzzy Cognitive 
Map), dont les naeuds, C, G, sont des concepts va- 
ables et les connexons, Ej des degrés de causa- 
§ité. (D'aprés B. Kasko). 


A’ aAsices deux formes sont suf- 
fisamment voisines. 

B. Kosko décrit la theorie des 
mémoires associatives floues, 
dans lesquelles, si l'on a mémv- 
rise fe couple (A,B) et si 'on prée- 
sente l'entrée A’ proche de pro- 
che de A, on récupére en sortie B’ 
proche de B. Ces dispositifs peu- 
vent s'‘appliquer aux systémes ex- 
perts. «La mémoire associative 
floue la plus intéressante que 
nous ayons développée est la ta- 
ble cognitive floue ou FCM (Fuzzy 
Cognitive Map) », précise ce cher- 
cheur. 

Une FCM est un graphe flou 
orienté, dont les neeuds sont des 
concepts variables ou des ensem- 
bles flous, et les connexions re- 
présentent des degrés de causa- 
lite (fig. 2). Ainsi, activation du 
concept Ci induit celle de Cj avec 
le facteur d'incertitude ey. L'inte- 
rét d'un réseau FCM est qu'il peut 
inférer inductivement des «ré- 
gles », des relations ou la base de 
connaissances, méme en !'ab- 
sence d'un expert. 

Les systémes experts fondés 
sur les mémoires associatives 
peuvent fournir des réponses a 
des questions, méme si celles-ci 
n'étaient pas incluses au départ 
dans le systeme, ce qui permet- 
trait de construire des systemes 
experts adaptatifs. 


Altracteurs, 
trous et vallées 


Un modéle géométrique nous 
aidera 4 comprendre aussi bien le 
processus d'apprentissage que 
celui de rappel par association. 

L’'ensemble des états de la 
structure formant la mémoire as- 
sociative peut étre représenté par 
une surface, et 4 chaque état on 
fait correspondre un point de 
cette surface. Initialement, avant 
la phase d’apprentissage, nous 
avons une surface parfaitement 
plane, une tabula rasa. 

Apprendre une information, 
pour une mémoire auto-associa- 
tive, équivaut a creuser un « trou » 
dans cette surface. A l'issue de !a 
phase d'apprentissage, la struc- 
ture se presente, suivant notre 
analogie, comme une nappe de 
caoutchouc comprenant des 
creux et des bosses. En phase 
d'utilisation, entrer une forme re- 
vient a lacher une bille 4 partir 
d'un point de la surface. Si !a bille 
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se trouve déja au fond d'un trou, 
elle y reste évidemment : si elle 
est lachée a proximité d'un trou, 
elle est attirée vers celui-ci et s'ar- 
réte lorsqu’elle arrive au fond ; en- 
fin, si elle se situe trop loin d'un 
trou, ou a égale distance entre 
deux trous, son évolution est indé- 
terminée (fig. 3). 

Ce modéle met en évidence dif- 
ferentes propriétés des memoires 
associatives : 
1° Un état mémorisé par un ap- 
prentissage préalable devient un 
«attracteur» pour tous les états 
suffisamment voisins. 
2° La capacité de Ja mémoire 
nest pas illimitée : si deux attrac- 
teurs sont trop proches, la classifi- 
cation ne sera plus déterministe. 
3° La vitesse de classification est 
indépendante du nombre d'at- 
tracteurs. 

Une telle memoire réalise une 
partition des états entrées, chaque 
état étant associé a un attracteur, 
dans la mesure ou il n'est pas trop 
distant de celui-ci. Dans ce sens, 
il s'agit d'une partition floue, cer- 
tains états pouvant évoluer de fa- 
gon indéterminée vers plusieurs 
attracteurs, et d'autres vers au- 
cun. 

La surface en caoutchouc n'est 
assurément qu'une image, car 
nous avons vu que les informa- 
tions n‘étaient pas localisees, 
mais Gistribuées sur !'ensemble 
de la structure. Cette image cor- 
respond en réalité aux valeurs de 
l'energie potentielle de la struc- 
ture. A chaque point de la surface 
correspond une configuration de 
la méemoire caractérisee par une 
certaine valeur de cette énergie. 
Les informations mémorisées sont 
identifiees avec des points sta- 
bles ou en équilibre dans un sys- 
teme dynamique, c’est-a-dire des 
minima locaux de l'énergie (trous), 
vers lesquels sont attirés les états 
voisins. 

B. Kosko compare aussi cet es- 
pace des états a la surface d'un 
lac, ponctuée de tourbillons. L'al- 
gorithme de codage de la meé- 
moire sculpte un tourbillon autour 
de chaque modéle mémorisé, le 
modéle étant placé au fond du 
tourbillon. Si une forme entrée est 
voisine d’un modeéle, elle est aspr- 
ree par le plus proche tourbillon et 
reconnue comme ressembiant au 
modéle correspondant. 

Les mémoires hétéro-associati- 
ves peuvent étre représentées de 
maniére similaire, mais au lieu de 
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Espace des 
contigurations 





sant a une perturbation de ce qui_ 
a déja été appris ; 

esoit par une surimpression des 

nouvelles informations sur les an- 

ciennes, effagant ces derniéres, a 

linstar des palimpsestes, ces par- 

chemins manuscrits dont, au 

Moyen-Age, les moines grattaient 

la premiere écriture afin de pou- 

voir les réutiliser. ; 

il arrive aussi parfois que les 
meémoires associatives restituent 
de «faux» modéles, c’est-a-dire 
qu’elles comportent des états sta- 
bles ne correspondant a aucun 
apprentissage. Ces états parasi- 
tes sont analogues 4 |'impression 
de « déja vu » de la mémoire biolo- 


gique. 
La prise de décision. Elle est li¢e a 
"existence d'un seuil au-dessus 


Fig. 3. - Une mémoire associative peut étre représentée par une nanpe en caoutchouc. Les motifs mémorisés, 
Aet 8, délorment la surface en creant des bassins d'attraction. Si le motif entré en mémoire, A est ‘proche de A 
dans espace des coniigurations, alors il tombera rapidement vers A. La cote correspond a | ‘énergie potentielle 
des configurations. i apparait géomélriquement que fa vitesse de classification est indépendante du nombre 


de bassins, donc du nombre de motifs mémorisés. 
(D'aorés B. Kosko). 
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CLASSIFICATION 


Ceci est un deux (c} 


RECONNAISSANCE DES FORMES 


Fig. 4. - Une mémoire associative posséde plusieurs fonctions 
a) celle de correction d erreurs (memoure auto-associative), 


b) de classification (mémaire hetéro-associative), 


¢) de reconnaissance des formes (auto ou hétéro-associative, selon que la mémovre fail apparaltre en sortie la 


forme complete inttialement mémorisée ou une information associée 4 cette forme). 


(D aprés L. Personnaz). 


creuser des trous attracteurs 
dans la surface, ce sont des « val- 
lees » qui entrainent la bille d'un 
point (ou d'une région) a un autre 
point de la surface 


Eléve parfait 
ou palimpseste 


Trois fonctions caractérisent les 
mémoires associatives : 
La mémorisation. Elle se fait par 
une sorte d'apprentissage, comme 
nous le verrons plus loin. Il s‘agit 
d'une fonction « intelligente », 
contrairement a la meémorisation 
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passive qui a lieu dans les mémoi- 
res classiques. En effet, une mé- 
moire associative n'est pas stric- 
tement limitée en capacite, mais, 
a partir d'un certain seuil, il peut 
se produire differents phénomeé- 
nes qui empéchent d'emmagasi- 
ner d'autres informations. Cela se 
traduit 

esoit par un refus pur et simple 
dacquerir une nouvelle informa- 
tion. tout en restant capable de 
restituer exactement les informa- 
tions apprises antérieurement 
c'est le « syndrome de l'éléve par- 
fait»; 

e@soit par une confusion aboutis- 


duquel le systeme se met sponta- 
nement dans |'état correspondant 
& un modéle préalablement mé- 
morisé. C'est donc la mémoire 
elle-méme qui décide lequel des 
modéles correspond a la forme 
présentée, méme si celle-ci est in- 
complete ou inexacte. Par exem- 
ple, dans I'expérience effectuée 
par l'équipe de Gérard Dreyfus, a 
"Ecole supérieure de physique et. 
chimie de Paris (ESCI), la struc- 
ture meémorise, lors de la phase 
d'apprentissage, une série de 
phrases (vers, titres de publica- 
tions, noms d’auteurs...); dans 
une seconde phase, dite d'utilisa- 
tion ou de rappel, des versions 
déformées ou incomplétes sont 
présentées a l'entreée de la mé- 
moire. En sortie, est restituée la 
forme correcte de la phrase, c’est- 
a-dire celle, parmi les phrases mé- 
morisées, qui est la plus proche 
de la phrase entrée (fig. 4a). 
La classification. L'exemple pré- 
cédent correspond 4 la fonction 
auto-associative. Si la mémoire 
est hétéro-associative, elle est ca- 
pable d'associer automatique- 
ment différents concepts, par 
exemple le nom de I’auteur du 
vers qui est présenté en entrée ; 
méme si ce vers est déformé, le 
nom associé est restitué sous sa 
forme correcte (fig. 4b). Bien sar, 
association doit étre préalable- 
ment apprise 

Cette derniére fonction est utile 
en reconnaissance de formes, no- 
tamment lorsqu’il s'agit de carac- 
téres manuscrits : la mémoire as- 
socie a la forme un mot ou une 
phrase définissant de maniére 
univoque Cette forme (fig. 4c). 
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Comment ces associations, qui 
font tellement penser aux facultes 
des @tres vivants, peuvent-elles 
étre implémentées sur des struc- 
tures artificielles ? 

Des travaux sont poursuivis en 
collaboration avec des neurophy- 
siologistes. C'est le cas a l'Ecole 
nationale supérieure, ou Gérard 
Toulouse et Jean-Pierre Chan- 
geux étudient les memoires as- 
Sociatives 4 partir des connais- 
sances sur le fonctionnement cé- 
rébral, Par ailleurs, des cher- 
cheurs ont mis en evidence |'ana- 
logie qui existe entre les reseaux 
de neurones et certaines structu- 
res physiques obéissant aux lois 
de la mécanique statistique. J.-J. 
Hopfield, en particulier, fut Tun 
des premiers a avoir trouvé un 
modéle pour les mémoires asso- 
ciatives. 

Considérons un ensemble de 
cellules, ou «neurones formels », 
reliés entre eux par des « synap- 
ses», chaque neurone etant 
connecté a tous les autres, a l’ins- 
tar du réseau de perles d'indra, 
creé par ta mythologie indienne il 
y aplus de 2 500 ans. 

Chacun de ces neurones peut 
prendre certaines valeurs en fonc- 
tion de celles qu'il regoit en en- 
trée, et modifie en conséquence 
son état, auquel est attribué une 
valeur numérique. On distingue 
deux catégories de neurones : les 
« analogiques », qui peuvent pren- 
dre toutes les valeurs comprises 
entre — 1 et + 1, et les « binaires » 
qui, suivant le modéle de McCul- 
loch et Pitts, n'ont que deux états 
possibles, un état actif, corres- 
pondant a la vateur + 1, et un état 
inactif, noté — 1. Une configura: 

. tion d'un tel réseau est done un 
vecteur binaire a n composantes. 

Aux n neurones,” formant les 
noeuds du réseau, correspondent 
n x n synapses. L'efficacité Cj 
d'une synapse (du neurone i au 
neurone j) peut étre positive (la 
synapse est alors excitatrice), né- 
gative (synapse inhibitrice) ou 
nulle (synapse inexistante) (fig. 5) 

L'entrée d'une information dans 
un tel réseau équivaut a activer 
certains neurones et & en désacti- 
ver d'autres, pour obtenir une cer- 
taine configuration. Ensuite, le ré- 
seau évolue spontanément vers 
une configuration stable, ou at- 
tracteur, chaque neurone calcu- 
lant sa valeur de sortie, ou « po- 
tentiel synaptique », en fonction 
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Fig, 5. - Réseau de cing neurones entiérement connecté : chaque neurone esl relié a tous les autres par un 
coefficient synaptique Cj. Sur le schéma, les cercles représentent les neurones avec le seuil corespondant 
(Q)), etles points des synapses. 

(D’aprés L, Personnaz, |. Guyon et G. Dreyfus). 
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Fig. 6 — Réseau de neurones correspandant d fa matrice d‘autocarrélation forme parle stockage des formes 


xfe(f-t-1 1-1 fletx=(1 7 1-1 -1~1). 1 correspond 4 un neurone acti, -1 4 un neurone inactit 


"Les coefficients synaptiques positifs désignent une excitation mutuelie des neurones ; les coefficients négatits 


une inhibition, Les synapses nuiles ne sont pas représentées. 
(D'aprés B. Kosko). : ieee 





“de ses entrées, c'est-a-dire les 


états de tous les neurones pondé- 
rés par leurs coefficients synapti- 


ques, suivant un algorithme de © 


caicul. ride 
Pour comprendre le fonctionne- 


“ment des mémoires associatives, 


nous commencerons par étudier 
te cas linéaire, dont fa théorie a 


été amplement décrite par T. Ko- 

honen. Nous suivrons la méthode 

proposée par B. Kosko. 
Reprenons la formule énoncee 

plus haut : 

y=Ax 

ou x et y sont des ccnfigurations, 

ou vecteurs d'état, de la mémoire, 

et Aest une matrice. 
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Fig. 7. ~ Projection du vectear y sur ke sous espace 
(plan) défini parxt etx2 se 
Considérons une mémoire as- 
sociative constituée d'un réseau 
de six neurones binaires. Les vec- 
teurs d'état ont done six compo- 
santes, pouvant prendre les va- 


leurs +1 ou —1. Nous voulons 
mémoriser dans ce réseau deux 


configurations : 
xt=(1-1-1-1-1 1) 
et 

x2s(1 1 1-1-1-1) 


Dans l'espace des configura- 
tions peut étre définie une dis- 
tance ; on choisit la distance de 
Hamming, H (x1, x2), égale au 
nombre de positions ot les va- 
leurs binaires x1 et x2 différent. 
Dans notre cas, H(x1, x2)=4. fl im- 
porte que cette distance (qui cor- 
respond a peu prés 4 la distance 
géomeétrique entre les trous de la 
surface en caoutchouc) soit suffi- 
samment grande afin que la mé- 
moire puisse discerner les deux 
etats. 

L’apprentissage consiste a cal- 
culer la matrice A, dite « matrice 
synaptique », d'auto-association : 
elle doit vérifier 
Ax1 = x1 et Ax2 = x2 
ce qui équivaut a dire que si l'une 
des informations préalablement 
apprises est présentée a nou- 
veau, elle ressort identique a elle- 
méme. A est la matrice de projec- 
tion sur l'espace des formes 
mémorisées. Voici comment on la 
calcule: a partir du vecteur_x1, 
nous obtenons ia matrice x1™x1, 
ou x17 est le transposé du vecteur 
x1: 

1\ (1-1-1 

-1 

-1 

1 

-1 

1 


1-1 1) 


feet Aad 
104 4-1 9-9 
_f-14 4-1 4-9 
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et nous faisons de méme avec x2. 

Nous obtenons ainsi: . . 

 A=xiT xt +x2Tx2e 5 5° - 
: 2.0 0 0-2 °0 

0.2 2-2 0-2 

0 2)..2-2 0 -2 

0 -2-2.2 0 2 

22 000 2 0 


0-2-2 2 0 2 


Chaque élément de matrice, Aj, 
donne la valeur de la connexion, 
ou synapse, entre le neurone i et 
le neurone j. 

Si Aj > 0, le neurone i active le 
neurone j (oj = +1) 

Si Ay < 0, le neurone i désac- 
tive le neurone j (oj = —1) 

Si Aj = 0, le neurone i n'affecte 
pas le neurone j (fig. 6). 

Si maintenant nous entrons un 
vecteur d'état y different de xt et 
de x2, le résultat de la multiplica- 
tion de y par la matrice A donne 
un vecteur y', peut-étre different 
de x1 et x2, mais situé dans le 
sous-espace defini par ces vec- 
teurs (fig. 7). On peut caiculer la 
distance H entre y’ et x1, et de 
méme avec x2. On associera y' a 
l'un d'eux si sa distance avec ce- 
lui-ci est inférieure a un certain 
seuil. C'est généralement l'utilisa- 
teur qui devra fixer ce seuil, car il 
ne peut étre défini une fois pour 
toutes. Ainsi, généralement, une 
mémoire associative linéaire ne 
peut donner qu'une indication 
chiffrée sur la ressemblance, la 
décision restant du ressort de 
Futilisateur. 

Pour que la prise de décision 
soit integrée dans la mémoire as- 
sociative, il faut introduire une 
non-linéarité dans son fonctionne- 
ment. Dans le cas d’un réseau de 
neurones binaires, le probleme se 
ramene a fixer un seuil de déci- 
sion pour chacun des neurones. 


Les réseaux 
de neurones formels 


John Hopfield (Californian Insti- 
tute of Technology) a mis en évi- 
dence !analogie entre certaines 
structures physiques, en particu- 
lier les « verres de spin », et les ré- 
seaux de neurones formels. il 
montra que, au lieu d'appliquer 
une fois la matrice synaptique au 
vecteur d'état, en réitérant l'opé- 
ration plusieurs fois sur des neuro- 
nes capables, chacun, de prendre 
une decision a chaque itération, 
ces réseaux svoluaient toujours 





vers un état stable. La totalité de: 
toutes les prises de décision, a | 
fois par chaque neurone et a cha- 
que itération, aboutit a une déc 
sion de l'ensemble du systéme. 

Alors que, dans les mémoires 
associatives linéaires, A était une’ ~ 
matrice de projection sur l'espace 
des états mémorisés, la matrice™ 
Ci des réseaux de neurones dé- 
termine les interactions entre les 
neurones. 

J. Hopfield eut l'idée d'utiliser 
des lois d'apprentissage inspirées 
des neurophysiologistes, notam- 
ment la loi de Hebb. Malheureuse- 
ment, elle conduit 4 des états sta- 


bles qui ne correspondent pas °° 


toujours a une information préala- 
blement apprise. Cette loi n'est 
donc pas trés fiable. 

L. Personnaz, |. Guyon et G. 
Dreyfus, du laboratoire d'électro- 
nique de l'ESPCI, ont dérivé de 
cette loi un nouvel algorithme qui 
garantit, dans des conditions as- 
sez générales, la restitution du 
modeéle mémorisé, suivant le syn- 
drome de I'éléve parfait. 

Cette loi de Hebb généralisée, 
qui s'applique a un réseau de n 
neurones binaires entiérement in- 
terconnectés, permet, a chaque 
iteration, que tous les neurones 
réactualisent leur état: l'état 
oi(t+1) du neurone t a un instant 
donne est calculé en fonction des 
états de tous les autres neurones 
a l'instant précédent (t). Pour 
cela, le neurone i effectue une 
somme pondérée, qu'il compare a 
une valeur de seuil. 


4 Ci oi(t) > => of (t+1) = +1 
<= 0) (t+1) = -1 
je = Gi =+0; (t+1) = a(t) 


Les paramétres du réseau de 
neurones sont la matrice carrée 
(nxn) de coefficients Cj et le vec- 
teur an composantes 6;. La déter- 
mination de ces paramétres équi- 
vaut a l'apprentissage. Ainsi, tout 
élément de fa matrice stocke une 
partie de chacune des informa- 
tions meémorisées. « En pratique, 
précise L. Personnaz, l'apprentis- 
sage se fait habituellement de 
maniére itérative : on presente a 
"entree une des données a ap- 
prendre, et l'on calcule les para- 
metres de la mémoire de fagon a 
obtenir ta reponse désirée corres- 
pondant 4 Ia sortie ; le processus 
est réitéré jusqu’a ce que toutes 
les données aient été memorisées 
(ou que la capacite de stockage 
maximaie ait 4té atteinte). » 
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L'état du réseau est défini par 
le vecteur a mn composantes a. 
Présenter une forme ala mémoire 
équivaut a initialiser le systeme 
avec le vecteur d'état io;{0)... 
60)... an(0)i. 

Toutes les operations sont ef- 
fectuees de maniére synchrone 
sur tous les neurones, C'est un 
processus itératif: a chaque pé- 
riode, ta configuration du réseau 
change jusqu’a ce qu'elle par- 
vienne a un état stable. Celui-ci 
est toujours atteint rapidement 
grace au processus de rétroaction 
contenu dans l’opération de_ 
comparaison 4 un séuil. « Ces ré- 
gles d’apprentissage s'avérent 
suffisamment fiables pour pouvoir 
faire de la reconnaissance de ca- 
ractéres manuscrits », ajoute G. 
Dreyfus. 

En effet, en suivant la loi de 
Hebb généralisée, les états finaux 
correspondent toujours a des 
états préalablement appris, a 
moins qu'ils soient trop éloignés 
de ceux-ci. Ainsi, le systeme mis 
au point par L. Personnaz et coll., 
ott Je réseau est simulé sur micro- 
ordinateur, a été utilisé pour corri- 
ger des titres d'articles dans une 
bibliographie et pour reconnaitre 
des chiffres manuscrits, avec un 


Fig. 8. - Dans le systéme de reconnaissance de chif. 
fres manuscrits développé 4 I'ENST, chaque forme 
est numénsée dans une image de 500 pixels (30 
x 20) qui peuvent étre soit blancs, soit nowrs. A cha- 
que pixel correspond un neurone dont |'état est dé- 
. "fini par la couleur (noir: aj +1 ; blanc :6; + -1) 
Les dix demiers pixels sont destinés 4 coder les aix 
classes possibles des chittres de la numérisation dé- 
cimate. La dimension de ‘espace de représeniation 
+. (icin = 600) doit étre suffisamment grande devant le 
nombre de prototypes 4 mémonsser, compte tenu du 
fait que, Jors de |'apprentissage, trots 4 cing moaéles 
seront conservés pour chaque classe de chifres 
Tous les prototypes d'une classe sont enregistrés 
* avec le méme champ de code (-1, ~f, .. -1, +4) 
pour f.(-1, -1,.. +1, -1) pou 2 et ainsi de suite (a). 
aa phase d'utisation, Jes caraciéres incannus sont 
~ présentés avec Je champ de code 4 -1 partout. 
< Apés traitement, fun des tits de ce champ se met a 
"+1 et le caractére est reconnu comme faisant partie 
< de a classe correspondante. 
_, Létat stable atteint par le réseau de neurones, aprés 


. quelques itératios (3 4 10 dans cette expénence. 

+ mais souvent l'identitication est déja effectuée 4 fa 
1. premuére iteration), correspond soi 2 un prototype, 
“te gait 4 une combenaison linéaire avec une prédom- 


fance des protatypes de 'a bonne classe (b}. 

Le réseau est donc capable de généraliser des for- 
‘par aulo-apprentissage. 5 

‘Chague bit pris islément n'a pas d‘imporlance déci 
‘sive dans Ja représentation, jes défauts d'mage 
sont en général corigés par la méme occasion. 
{D’aprés L Personnaz). _ 
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OPTION BASE 1 
Ne64 
DIM 
BIM 


Impression$f100) 
AC1aa@> 

DIM 0< 109) 

DIM T< 108, 1€@> 

DIM ¥¢18@> 

DIM ¥118@> 

REDIM ACH) 

REDIM OCHD 

REDIM TCN,N) 

REDIM VCND 

REDIM VICND 

LOHRREHSE EEN AHEERRERE [ere LETTRE FFE SEERA REET EEE EEE CERES 


DATA 8,8,0,€,0,0,8,8 


DATA @,1,1,1,1,1,1,8 
DATA @,1,1,1,1,1,1,8 

DATA @,1,1,€,0,1,1,9 

DATA 0,1,4,1,1,1,1,0 

DATA 0,1,1,1,1,1,1,8 

DATA 8,1,1,€,0,1,1,8 

DATA 8,1,1,€,0,1,1,@ 

{ RERARER ERA EES ER ERE Orie LETTRE FAN e RHEE EHEER NEHER EE HERS 
DATA @,8,0,€,0,0,8,8 

DATA 9,@,1,1,1,1,8,8 

DATA @,1,1,1,1,1,1,8 

DATA @,1,1,€,9,1,1,8 

DATA 9,1,1,€,0,1,1,8 

DATA @,1,1,€,9,1,1,8 

DATA @,1,1,1,1,1,1,8 

DATA 0,8,1,1,1,1,8,@ 

| weeeeoreeeneeee LECTURE DES 2 LETTRES fH RHR REE R ERE EE EE EE RH 
FOR [=1 TO #& 

READ ACI> 

MEXT I 

PRINT "R=" 

MAT VSR 

GOSUB Impression 

FOR T=t TOW 

READ OCT) 

NEXT I 

PRINT “Oe” 

MAT ¥=O 


GOSUB Impression 
| Rete HERR Cee eee CALCUL DE LA MATRICE +++ 8 tet ee etree tone 
L eee Re RAT RRR EER EEE SYMAPTICIUE tH etter eset eet tReet eee 
FOR Tet TO b-t 

FOR J=1 TOK 

TOT, Fest OS, PIS COMACT ALD RC ZERCT Loe CQHOCT WL de CeHOC TIAL) 
TCL 5 

NEXT J 

NENT I 

TAN. ND=@ 
ererrrrrrrrrrr ss 
(dere eee EEE HERES PROPOSE 


VECTEUR D° EN REE 4#e4 he eH eRe a RE REE eee 
BREE ARR RE EER ERENT EERE 


DATA 0,9,0,%,9,9,9,9 
DATA 8,0,1,1,1,8,9,1 
DATA 0,1,1,1,0,1,1,8 
DATA ,1,4,€,0,1,1,8 
DATR O,1,1,0,0,1,1,0 
DATA O,1,1,2,1,0,1,2 
DRTA 0,1,1,€,0,1,1,0 
DATA 0,6,1,€,1,1,8,2 
FOR I=t Toh 

READ VOI) 

NEXT I 

PRINT "Ve" 


GUSUB Impression 
| separ eteesteee REALIGFTION DE sttteeet ee senna rece 
S wee ee eeseteeeae LOALGURITHME cto ters ee eee eens renee 








WAT Vietey 

MRT VevI-<.t) 

| Assure une bonne réatisation du seurllage avec la fanition 
! mathematicue GON :@ <---> -4. 


MAT VESGR 2 





GOSUB 
PAUSE 
GOTO 716 
errrererrrerercrer ©. feerar at) 
FOR 129 TO SRN et 
REND 
mech eS o=t THEY 
Ron #3020 THEN 


Inpeezsion 


erreteerreerererreerrerrerres 





Lupe 
Tape 





PRINT Impr 
HEXT I 
PRINT “-------~ t 
PETRA 
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SIMULATION 
DE MEMOIRE ASSOCIATIVE 
SUR MICRO-ORDINATEUR 


par Philippe Lalanne 
(Institut d’aptique 
théorique et appliquée, 
Orsay) 


1- Programme Basic utilisant l'axgorithme de 
Hopfield 

2-Un exemple du phénoméne de mémoire 
associative: N = 64 neurones; deux élats 
memonsés : A et 0. 

£tat initial des neurones : V. 

Au bout de deux itérations, le réseau 
converge vers /état A 

(Beaucoup d'autres états V pourraient étre 
proposes ; en général, il y a convergence en 
une itération vers l'une des deux lettres). 

3- Un contreexemple : état proposé V, bien 
que légérement bruité (en comparaison de 
Fexemple précédent), converge en une sléra- 
tion vers un état stable non voulu par lalgo- 
nifine. 

A noter: 4 partir du programme proposé, il 
est wes facile de stocker d'autres iettres. 
Mais attention, pour avoir de bons résultats $i 
Ton veut en mémoniser plus d'une dizaine, i 
faut augmenter le nombre de neurones, N. 
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succes de 80 %, 10 % des formes 
étant mal identifiges et les 10 % 
restantes n'étant pas reconnues 
du tout (fig. 8). 

«idéalement, nous essayons de 
faire des systemes de reconnais- 
sance de caractéres indépen- 
dants de la taille, de la position et 
de la rotation », prévoit G. Drey- 
fus. 

Le parallélisme résultant de la 
distribution de information sur 
tout le réseau implique que le 
temps de traitement est théori- 
quement indépendant du nombre 
d'informations mémorisées, alors 
qu'avec les mémoires classiques, 
qui nécessitent une recherche sé- 
quentielle, la durée des calculs 
augmente trés vite avec le nom- 
bre de données enregistrées. Si 
les modéles sont trop nombreux, 
on ne perd pas en vitesse, mais 
en précision, et donc en fiabilité. 


Simuler des réseaux 
de neurones 


En simulant les reseaux de neu- 
rones sur ordinateur, les calculs 
sont évidemment relativement 
longs, car les ordinateurs ne peu- 
vent tirer pleinement profit du pa- 
rallélisme inhérent a Ces réseaux ; 
néanmoins le systeme dynamique 
converge rapidement, au bout de 
quelques itérations. 

De plus, ces simulations sur or- 
dinateur séquentiel sont utiles 
pour comprendre le fonctionne- 
ment des mémoires associatives. 
C'est pour cette solution qu'ont 
opté les chercheurs de |'ESPCI, 
qui ont muni l’ordinateur (de type 
PC) de processeurs spécialises, 
telle Transputer d'Inmos. 

A plus long terme, fa construc- 
tion de réseaux de neurones inté- 
grés sur silicium est envisagée. 
L'ESPC!, notamment, travaille a 
I'élaboration de ce projet, en colla- 
boration avec d'autres laboratoi- 
res, tant frangais qu’étrangers. 
Ces études portent sur la mise en 
ceuvre de technologies micro- 
électroniques classiques ou avan- 
cées, tel WSI (« Wafer Scale Inte- 
gration »: intégration sur tranche 
entiére de silicium). Le principal 
obstacle de la technologie WSI 
étant la présence inévitable de 
défauts (il est pratiquement im- 
possible d'avoir zéro défaut sur 
toute la wanche avec les procé- 
dés actuels), les propriétés de re- 
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dondance, donc de tolérance aux 
défauts, des réseaux de neurones 
seraient mises a profit. Car, expii- 
que P. Peretto, du Centre d'étu- 
des nuciéaires de Grenoble, il faut 
«considérer une assemblée de 
neurones comme un systéme col- 
lectif ef non comme un circuit, 
dont chaque élément aurait une 
fonction déterminée », 


De l'électronique 
a/optique 

« llexiste maintenant des dispo- 
sitifs matériels, appelés neuro- 
computers, qui se comportent 
comme des cerveaux, mais res- 
semblent 4 des ordinateurs analo- 
giques », annonce B. Kosko, Ce 
chercheur a développé des mé- 
moires associatives floues a base 
de technologies électroniques et 
électro-optiques. 

Toujours aux Etats-Unis, le De- 
fense Advanced Recherch Project 
Agency (DARPA) a fondé un dé- 
partement de recherches a San 
Diego (Californie) pour dévelop- 
per des réseaux de neurones. 
L'un deux, Mark Ill, congu préci- 
sément pour jes besoins de cette 
recherche, est destiné a aider les 
acheteurs intéressés par l'imple- 
mentation des réseaux de neuro- 
nes. Vendu au prix de 53 000 dol- 
lars, Mark IN contient 8 100 
processeurs éiémentaires, inter- 
connectés par 417 000 liaisons. It 
fonctionne comme coprocesseur 
d'un Vax dans un environnement 
VMS. : 
L’évolution des reseaux de neu- 
rones étant décrite par des équa- 
tions différentielles, les technolo- 
gies analogiques se prétent 
particuliérement bien a la mise en 
ceuvre de ces dispositifs (Mier 
Systémes n° 59 page 104) JAux la" 
boratoires Bell, une réafisation de 
réseau analogique a été faite, tan- 
dis qu’au Centre d'études. nu- 
cléaires de Grenoble des études 
sont en cours au laboratoire de 
Pierre Peretto. AS 

Etant donné le parallélisme des 
calculs effectués sur les réseaux 
de neurones, l'optique semble 
étre une approche plus appro- 
priée que l'electronique. En effet, 
il existe une tendance a remplacer 
les électrons par des photons par- 
tout ot c'est possible, dans les or- 
dinateurs, et ce pour deux rai- 
sons, comme je précise Pierre 
Chavel de l'Institut d’optique a Or- 
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say : « D'une part, l'interaction lu- 
miére-matiére peut étre plus ra- 
pide que les interactions 
électroniques dans un semi- 
conducteur. » C'est ainsi qu'une 
nouvelle famille de composants 
est en train de se développer : les 
valves optiques ou modulateurs 
spatiaux de lumiére. Ce sont des 
dispositifs qui regoivent de la lu- 
miére et sont capables de modi- 
fier leur transparence ~ ou, inver- 
sement, leur réflectance - trés 
rapidement, par commande opti- 
que, électronique ou magnétique. 
Des zones peuvent ainsi étre noir- 
cies a la maniére d'une photogra- 
phie, mais sans nécessiter de dé- 
veloppement. 

« Dautre part, poursuit P. Cha- 
vel, deux électrons interagissent, 
mais deux photons non, dou {'in- 
térét de l'optique pour les 
communications, » Des faisceaux 
lumineux peuvent, en effet, se 
croiser sans interférer dans |'es- 
pace libre. « Tous les algorithmes 
qui nécessitent un nombre consi- 
dérable de communications peu- 
vent tre mieux réalisés et plus 
vite que sur ordinateur électroni- 
que, qu'il soit séquentiel ou paral- 
léle. » 

C'est pourquoi Pierre Chavel, 
Philippe Lalanne et Jean Taboury 
(institut d'optique) travaillent ac- 
tueilement a la réalisation de meé- 
moires associatives optiques. Cel- 
les-ci sont fondées sur des 
réseaux bidimensionnels de 
points a transparence variable 
mettant en ceuvre des valves opti- 
ques, lesquelles remplacent avan- 
tageusement les transistors quant 
au temps de réponse : ce temps 
est de quelques femtosecondes 
dans le premier cas contre quel- 

. ques picosecondes (soit mille fois 
plus) pour les semi-conducteurs 
les plus rapides tels l'arséniure de 
galhum. 

Alain Maruani, Gabriel Sirat et 
R. Chevalier ont réalisé, a l'Ecole 
nationale supérieure des télécom- 
munications (ENST), un réseau de 
48 neurones avec une matrice 48 
x 48 magnéto-optique, dont la 
transparence peut &tre modifiée 
trés rapidement (A la cadence té- 
lévision) par un champ magneti- 
que. 


Des mémoires 
associatives gptiques 


Demetri Psaltis (California Insti- 
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tute of Technology) et Nabil Fa- 
that (University of Pennsylvania) 
eurent l'idée de réaliser les pre- 
miers réseaux optiques a partir du 
modéle de Hopfield, 

La premiére réalisation, en 
1984, comprend un alignement de 
n diodes constituant autant 
de neurones binaires. Les 
connexions se font optiquement, 
par l'intermédiaire d'un masque 
des connexions (ou matrice Sy: 
naptique) an xn cellules, suivant 
fe schéma de la figure 9. 

Une lentille cylindrique entre tes 
neurones émetteurs et la matrice 
synaptique permet a chaque neu- 
rone, s'il est actif, d'éciairer toute 
une colonne de cette matrice (au 
neurone i correspond la i® colonne 
C)). 

A\la sortie de la matrice est pla- 
cée une autre lentille cylindrique, 
orthogonale a la premiére, qui fo- 
Calise la lumiére issue d'une ligne 
de la matrice, sur un point. Une 
série de n photodiodes sont pla- 
cées aux points de focalisation, 
de telle sorte qu’a la j¢ ligne de la 
matrice corresponde le neurone j 
(Ni). 

Enfin, pour qu'il y ait mémoire 
associative, il faut ajouter, entre 
les neurones récepteurs et les 
neurones émetteurs, un dispositif 
permettant la rétroaction avec 
seuillage. Chaque diode N'j est re- 
liée a la diode Nj par une 
connexion électrique ou optique. 
Si le signal passant par cette 
connexion est supérieur a une 
certaine valeur, il est amplifié et le 
neurone j est activé, c’est-a-dire 
que la diode Nj est allumée. Si ce 
signal est inférieur au seuil, il s'an- 
nule et le neurone j est désactive 
(diode N; éteinte). Un neurone est 
donc constitué par l'ensemble de 
ja diode réceptrice, de la 
connexion, du dispositif de seuil- 
lage et du neurone émetteur cor- 
respondant. 

Le dispositif de seuillage est 
genéralement électronique. Ce 
peut aussi étre une valve optique. 
Toutefois, dans ce cas, celle-ci 
doit étre modifiable 4 chaque ité- 
ration, ce qui nécessite des 
temps de changement beaucoup 
plus brefs. 

L’apprentissage se fait dans la 
matrice de connexion dont les élé- 
ments sont rendus plus ou moins 
transparents. Supposons que 
nous voulions lui apprendre la va- 
leur 100101, Ce nombre binaire 





doit étre inscrit sur la i¢ ligne et la 
i® colonne de la matrice synapti- 
que, sous la forme d'éléments < 
blancs (transparents) pour 1, et 
noirs (opaques) pour 0. Si plu- 
sieurs valeurs doivent étre memo- 
risées, il pourra y avoir superposi- 
tion de noirs et de blancs sur 
certains éléments de matrice, 
Ceux-ci devront done admettre un 
certain nombre de niveaux de 
gris. La matrice synaptique réali- 
sée a I'ENST comporte 256 ni- 
veaux de gris. 

L'équipe de l'Institut d'optique 
d'Orsay étudie la possibilité de 
réaliser cette matrice de 
connexion avec une valve opti- 
que. Les travaux portent sur un 
réseau de 20 neurones. Les va- 
leurs de la valve optique, pouvant 
varier contindment, sont chargées 
lors de chaque apprentissage, 

Gabriel Sirat, qui a travaillé 
avec D. Psaltis aux Etats-Unis, 
poursuit l'étude de l'implémenta- 
tion optique a l'ENST depuis 
1986. Les dispositifs 4 une dimen- 
sion, que nous venons de voir, 
sont seulement démonstratifs : ne 
permettant de stocker que des 
vecteurs binaires, ils ne sont pas 
encore équivalents aux simula- 
tions informatiques qui, comme 
nous l'avons vu, portent sur plu- 
Sieurs Centaines de neurones. 


Mémoires associatives 
et holographie 


Afin d'accroitre la capacité de 
stockage de ces mémoires asso- 
Clatives optiques et de pouvoir les 
appliquer a la reconnaissance de 
formes, une deuxiéme approche 
consiste a réaliser un réseau de 
neurones bidimensionnels. Dans 
ce cas, le masque de connexion 
n'est plus une matrice, mais il de- 
vient un tenseur d'ordre 4, 

Ce masque peut étre remplacé 
pat un hologramme (H1) de la 
forme F a reconnaitre et par une 
lentille. L’hologramme est, en ef- 
fet, le prototype m&éme d'une mé- 
moire associative. La procédure 
d'autocorrélation est étroitement 
liée a la superposition de franges 
de diffraction sur un hologramme., 
Des analogies physiques entre jes 
réseaux neuronaux et les systé- 
mes optiques utilises pour pro- 
duire des hologrammes ont été 
mises en évidence. Outre la na- 
ture distribuée de l'information 
dans un hologramme comme 
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synoptique réceptrices 
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cylindrique 





Rétroaction 
non linéaire 


Fig. 9. - Réseau de neurones optiques, selon l'idée de D. Psaltis et N. Farhat. Les neurones sont des photodio- 
des Ni alignées honzontalement. Si celles-ci sont allumées, cela équivaut 4 un neurone actif (+1); si elles sont 
éfeintes, fe neurone est inactif (-1). 

La matrice synaptique est un masque ayant des transparences variables, du blanc au noir, et pouvant étre 
moaifiées par 'apprentissage. 

La lumiére émise par une photodiode illumine, via une lentille cylindrique, une colonne de fa maine. La lumiére 
transmise par la mattice Sirepinie dépend donc de I'étal des photodiodes émetinces et de la transparence 
de la matnce elle-méme. Chaque ligne de la matrice est focalisée, par l'intermédiaire d'une lentile cylindrque, 
sur une photodiode d'une rangée verticale N’. 

Chacune de ces diodes N’, est reliée a la diode émettrice comespondante Nj par une boucle de rétroaction non 
linéaire : fe signal regu est comparé 4 un seuil Q; Selon le résuital de la comparaison, la diode N; sera allumée 
ou éleinte, ef le processus est réitéré jusqu 4 ce qu il aboutisse a un élat siable. 


dans les autres mémoires asso- ment a celle mémorisée sur H1, le 


ciatives, I'holographie permet de 
corréler des images similaires. Si 
un hologramme est éclairé avec 
une image differente du faisceau 
de référence, la sortie est la fonc- 
tion de corrélation des images. Si 
la forme entrée dans le. réseau de 
neurones N correspond exacte- 


faisceau jumineux se concentre 
en un seul point ltumineux O dans 
un plan image. Ce point brillant 
est appelé « point de corrélation ». 
ll est généralement environné de 
lumiére parasites, et l'intensité du 
point est d’autant plus faible et 
les parasites d’autant plus impor- 


SEEGER) 


tants que l'image entrée est plus 
déiormée par rapport a 'image 
enregistrée sur H1. Si l'image en- 
trée est trop déformée par rapport 
a l'image memorisés, elle n'est 
pas du tout reconnue et le point 
de corrélation est éteint. 

L’apprentissage se fait en enre- 
gistrant une image sur I'holo- 
gramme H1 ou en modifiant ce- 
lui-ci. Pour pouvoir reconnafitre 
plusieurs formes, disons au nom- 
bre de m, les hologrammes cor- 
respondants sont multiplexés sur 
H1. A chacune des m formes mé- 
morisées correspond alors un 
point de corrélation sur le plan 
image O. Sila forme F1 est recon- 
nue, le point O1 s’allume; si F2 
est reconnue, O2 s‘allume; et 
ainsi de suite. 

Sur les m points de corrélation, 
il se peut que deux ou trois soient 
allumés en méme temps. Dans ce 
cas, il s'agit de savoir lequel cor- 
respond a la forme entrée dans le 
reseau N. 

Le dispositif expérimental, qui 
jusqu'ici n’était qu'un systéme de 
reconnaissance optique des for- 
mes, est complété par des éle- 
ments permettant la rétroaction 
non linéaire, afin de réaliser une 
memoire associative. 

Pour cela, un second holo- 
gramme He, conjugué de H1, et 
une seconde lentille sont placés 
derriére le pian O, permettant la 
formation de l'image de la forme 
reconnue sur un plan N : a chaque 
point O! correspond une forme F'. 
Si plusieurs points de corréiation 





Fig. 10. - La procédure d autocorrélation est étroitement reliée 4 la procédure d'interférence sur un hologramme, prototype de mémoire associative paralléle distribuée. 







Ce dispositif comporte un réseau bidimensionne! de neurones N. Une forme est entrée en mémoire en affumant certains points (diodes) de N. Le faisceau traverse un 
hologramme sur lequel son! enregistrées les formes mémonsées. Le faisceau issu de Ht est focalisé par une lentille L1 vers un écran Q. Si un seul point, O; s‘allume, fa 
forme enirée est reconnue comme étant F; Souvent plusieurs points s allument simutanément. La lurmére issue de O traverse un second hologramme H2 et une lentide 
12 et forme, sur le réseau bidimensionnel de diodes, l'image correspondant aux formes reconnues. Une valve optique effectue un seuillage sur cette image. et limace 
binaire obtenue ser a activer ou désactiver les points du reseau N. Le processus est réstéré jusqu a ce qu une forme et une seule soit elfectivement odtenue. Le sysiéme 
converge toujours vers un état stable, généralement au bout d'un petit nombre d itérations. 
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sont allumés, il se forme en N' une 
superposition de plusieurs ima- 
ges. Dans ce plan est placée une 
valve optique V qui effectue le 
seéuillage de l'image, laquelle est 
ensuite reprojetée (par un sys- 
téme de miroirs, par exemple) sur 
le réseau émetteur N (fig. 10). Le 
réseau de neurones ainsi réalisé 
effectue alors quelques itérations 
avant d'aboutir 4 un état stable 
correspondant généralement a 
l'une des formes enregistrées sur 
H1. 

Les simulations sur ordinateurs 
et les expériences optiques en 
cours permettent de reconnaitre 
des images binaires d'allure aléa- 
toire, mais ces dispositifs, fondés 
sur le modéle de Hopfield, n’arri- 
vent pas a distinguer des caracté- 
tes entre lesquels il existe une 
forte corrélation, 

A la société Verac (San Diego, 
Californie), Bart Kosko met en 
Geuvre des mémoires associatives 
ou l'information est stockée sous 
forme holographique, dans des 
hologrammes de volume a base 
de cristaux de niobate de {lithium 


ckage de ces dispositifs serait de 
l'ordre de 1013 bits/em?, 

Bien que les systémes étudiés 
nen soient encore qu'aux pre- 
miers stades et ne puissent étre 
opérationnels avant plusieurs an- 
nées, ils s'avérent déja trés pro- 
metteurs, par la nouvelle concep- 
tion du stockage et du traitement 
de l'information qu'ils imptiquent. 
Méme si leur complexité reste tou- 
jours bien inférieure a celle du cer- 
veau humain, avec ses quelques 
dizaines de milliards de neurones, 


~ «Etude de réseaux de neurones formels : concep- 
tion, propnétes et aopiications 4, par L. Personnaz, 
these de doctorat a Etat, Université Pans 6 26 juin 
1986. 

~ «Selhargenization and associative memones a, 
par? ononen, Sornger Verlag, New York, 1984. 


?4 Dar B. Kosko, 
( Byte (4 oaraites) 
~ «Puczy associative Memones », par B Kosko, in 
«Puzo Expert Systems s, Addison-Wesiay (a paral. 
tre} 
~ + ‘Jeural networks and physical systems with 















PARADIS, LE LOGIVEL IDTELIGEDT 
GUI VOUS PEeMET dF 
DIALOGVER SIMPLENENT 


Je veux imprimer tous mes clients 
avec leurs noms et adresses... 





..et PARADIS vous donne la liste compléte de vos clients. 

PARADIS est un générateur d’applications intelligent. 

Hl comprend le francais courant et vous permet de développer ainsi toutes vos applications 
de gestion : stock, facturation, devis, paie, suivi du personnel... 

PARADIS a une intelligence muiti-fenétres. 

PARADIS vous permet de visualiser, en méme temps, sur un méme écran, différents 
modules: traitement de textes, calculatrice, calendrier... 

PARADIS a une intelligence communicante. 

Crest interface avec d’autres logiciels: MULTIPLAN, LOTUS 1-2-3, STARTEXT, 
BTEXT, D BASE... sans oublier de recupérer les fichiers provenant de l’extérieur, 
PARADIS fonctionne sur IBM PC ou com 


patibles sous MS DOS et en version réseaux, 
sur BULL QUESTAR 400 sous STARSYS et 


sur BURROUGHS B20 - B25 sous BTOS. 


WANRAVAV/ASY UE 








“OHt 































(LINDO3). La capacité de sto-. 
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les mémoires associatives connai- 
tront certainement des applica. 
tions intéressantes, depuis la re- 
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Peut-tre une telle mémoire cor- 
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